polyfuzz.metrics
¶
precision_recall_curve(matches, precision_steps=0.01)
¶
Show source code in polyfuzz\metrics.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Calculate precision recall curve based on minimum similarity between strings
A minimum similarity score might be used to identify when a match could be considered to be correct. For example, we can assume that if a similarity score pass 0.95 we are quite confident that the matches are correct. This minimum similarity score can be defined as precision since it shows you how precise we believe the matches are at a minimum.
Recall can then be defined as as the percentage of matches found at a certain minimum similarity score. A high recall means that for a certain minimum precision score, we find many matches.
Parameters
Name | Type | Description | Default |
---|---|---|---|
matches |
DataFrame |
contains the columns From, To, and Similarity used for calculating precision, recall, and average precision | required |
precision_steps |
float |
the incremental steps in minimum precision | 0.01 |
Returns
Type | Description |
---|---|
Tuple[List[float], List[float], List[float]] |
min_precisions: minimum precision steps recall: recall per minimum precision step average_precision: average precision per minimum precision step |
visualize_precision_recall(matches, min_precisions, recall, kde=True, save_path=None)
¶
Show source code in polyfuzz\metrics.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
Visualize the precision recall curve for one or more models
Parameters
Name | Type | Description | Default |
---|---|---|---|
matches |
Mapping[str, pandas.core.frame.DataFrame] |
contains the columns From, To, and Similarity used for calculating precision, recall, and average precision per model | required |
min_precisions |
Mapping[str, List[float]] |
minimum precision steps per model | required |
recall |
Mapping[str, List[float]] |
recall per minimum precision step per model | required |
kde |
bool |
whether to also visualize the kde plot | True |
save_path |
str |
the path to save the resulting image to | None |
Usage:
visualize_precision_recall(matches, min_precisions, recall, save_path="data/results.png")