polyfuzz.models.RapidFuzz
¶
Calculate the Edit Distance between lists of strings using RapidFuzz's process function
We are using RapidFuzz instead of FuzzyWuzzy since it is much faster and does not require the more restrictive GPL license
Parameters
Name | Type | Description | Default |
---|---|---|---|
n_jobs |
int |
Nr of parallel processes, use -1 to use all cores | 1 |
score_cutoff |
float |
The minimum similarity for which to return a good match. Should be between 0 and 1. | 0 |
scorer |
Callable |
The scorer function to be used to calculate the edit distance Options: * fuzz.ratio * fuzz.partial_ratio * fuzz.token_sort_ratio * fuzz.partial_token_sort_ratio * fuzz.token_set_ratio * fuzz.partial_token_set_ratio * fuzz.token_ratio * fuzz.partial_token_ratio * fuzz.WRation * fuzz.QRatio See https://maxbachmann.github.io/rapidfuzz/usage/fuzz/ for an extensive description of the scoring methods. | <cyfunction WRatio at 0x00000237A334D2B0> |
model_id |
str |
The name of the particular instance, used when comparing models | None |
Usage:
from rapidfuzz import fuzz
model = RapidFuzz(n_jobs=-1, score_cutoff=0.5, scorer=fuzz.WRatio)
match(self, from_list, to_list=None, **kwargs)
¶
Show source code in models\_rapidfuzz.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
Calculate the edit distances between two list of strings by parallelizing the calculation and passing the lists in batches.
Parameters
Name | Type | Description | Default |
---|---|---|---|
from_list |
List[str] |
The list from which you want mappings | required |
to_list |
List[str] |
The list where you want to map to | None |
Returns
Type | Description |
---|---|
DataFrame |
matches: The best matches between the lists of strings |
Usage:
from rapidfuzz import fuzz
model = RapidFuzz(n_jobs=-1, score_cutoff=0.5, scorer=fuzz.WRatio)
matches = model.match(["string_one", "string_two"],
["string_three", "string_four"])