Introduction
This document will explain how to name match using FuzzyWuzzy on Python.
FuzzyWuzzy is a library of Python which is used for string matching. It can be used to match well names used in Corva and a customer’s system.
An example of mismatching names: “J. Leviathon 19-18HC #1 ALT” and “J Leviathon 19-18HC No. 1-ALT”. FuzzyWuzzy helps to identify that these two names refer to the same well.
To use fuzzywuzzy library add it to requirements.txt
fuzzywuzzy~=0.18.0
The function below takes two arguments:
-
a pattern string (Corva’s well name)
-
a list of candidates for matching (customer’s well names)
A return value is the index of a name from the list of candidates that matches the pattern in the best way.
def name_matching(name: str, candidates: List) -> int:
ratios = process.extractWithoutOrder(name, candidates)
ratios = list(ratios)
highest_ratio = ratios[0][1]
highest = ratios[0][0]
highest_ratio_index = 0
for i in range(1, len(ratios)):
if ratios[i][1] > highest_ratio:
highest_ratio_index = i
highest_ratio = ratios[i][1]
highest = ratios[i][0]
if highest_ratio <= 67:
Logger.info(f"Warning: The probability that '{name}' \
and '{highest}' is the same well is <= 67: {highest_ratio}")
return highest_ratio_index
Name matching usage
An example of usage name_matching usage
corva_well_name = "J Leviathon 19-18HC No. 1-ALT"
# data for multiple wells
customer_data = [
{
"id": 433,
"name": "Blanco 777H",
"date_created": "2020-05-10T18:11:42Z",
"data": {}
},
{
"id": 843,
"name": "White 1",
"date_created": "2021-06-11T18:09:17Z",
"data": {}
},
{
"id": 845,
"name": "J. Leviathon 19-18HC #1 ALT",
"date_created": "2021-06-11T18:09:17.644528Z",
"data": {}
}
]
# create a list of candidates
candidates = [item["name"] for item in customer_data]
# find the index of the best match
best_match_index = name_matching(corva_well_name, candidates)
# extract data for the target well
data = customer_data[best_match_index]["data"]
Name matching function
This function can be used to match any other entities (e.g. names of fields, properties, etc).
It writes to the log a warning if the probability that the best match refers to the same entity as the pattern is less than 67%.