LOOKUP
The LOOKUP function searches for a lookup value within a lookup array and returns an Excel Data Type containing the corresponding value from a return array, along with the similarity score and the matched text.
This function utilizes the jellyfish library to calculate similarity scores. For distance-based algorithms (Levenshtein, Damerau-Levenshtein, Hamming), the result is normalized to a similarity score between 0 and 1 using the formula: 1 - \frac{\text{distance}}{\max(\text{length}_1, \text{length}_2)}.
Supported algorithms include Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and Hamming.
Excel Usage
=LOOKUP(lookup_value, lookup_array, return_array, threshold, fuzzy_algo)
lookup_value(list[list], required): 2D range of text values to search for.lookup_array(list[list], required): 2D range of text values to search within.return_array(list[list], required): 2D range of values to return from.threshold(float, optional, default: 0.6): Minimum similarity score (0 to 1) required for a match.fuzzy_algo(str, optional, default: “levenshtein”): The string comparison algorithm to use.
Returns (list[list]): 2D list of Data Types containing the lookup result, similarity score, and matched text. Returns “no match” if no result is found above the threshold.
Example 1: Levenshtein lookup
Inputs:
| lookup_value | lookup_array | return_array | threshold | fuzzy_algo |
|---|---|---|---|---|
| appl | apple | 1.5 | 0.7 | levenshtein |
| banana | 0.5 |
Excel formula:
=LOOKUP({"appl"}, {"apple";"banana"}, {1.5;0.5}, 0.7, "levenshtein")
Expected output:
{"type":"Double","basicValue":1.5,"properties":{"Result":{"type":"Double","basicValue":1.5},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"apple"}}}
Example 2: Jaro-Winkler lookup
Inputs:
| lookup_value | lookup_array | return_array | threshold | fuzzy_algo |
|---|---|---|---|---|
| dixon | dicksonx | User A | 0.8 | jaro_winkler |
| dixon | User B |
Excel formula:
=LOOKUP({"dixon"}, {"dicksonx";"dixon"}, {"User A";"User B"}, 0.8, "jaro_winkler")
Expected output:
{"type":"String","basicValue":"User B","properties":{"Result":{"type":"String","basicValue":"User B"},"Score":{"type":"Double","basicValue":1},"Match":{"type":"String","basicValue":"dixon"}}}
Example 3: Multiple lookup values
Inputs:
| lookup_value | lookup_array | return_array | threshold | fuzzy_algo |
|---|---|---|---|---|
| appl | apple | Fruit 1 | 0.7 | jaro_winkler |
| bana | banana | Fruit 2 | ||
| orange | Fruit 3 |
Excel formula:
=LOOKUP({"appl";"bana"}, {"apple";"banana";"orange"}, {"Fruit 1";"Fruit 2";"Fruit 3"}, 0.7, "jaro_winkler")
Expected output:
| Result |
|---|
| [object Object] |
| [object Object] |
Example 4: No match found
Inputs:
| lookup_value | lookup_array | return_array | threshold | fuzzy_algo |
|---|---|---|---|---|
| xyz | apple | 1 | 0.8 | levenshtein |
| banana | 2 |
Excel formula:
=LOOKUP({"xyz"}, {"apple";"banana"}, {1;2}, 0.8, "levenshtein")
Expected output:
{"type":"String","basicValue":"no match","properties":{"Result":{"type":"String","basicValue":""},"Score":{"type":"String","basicValue":""},"Match":{"type":"String","basicValue":""}}}
Python Code
Show Code
import jellyfish
def lookup(lookup_value, lookup_array, return_array, threshold=0.6, fuzzy_algo='levenshtein'):
"""
Returns the value from a return array corresponding to the best fuzzy match.
See: https://github.com/jamesturk/jellyfish
This example function is provided as-is without any representation of accuracy.
Args:
lookup_value (list[list]): 2D range of text values to search for.
lookup_array (list[list]): 2D range of text values to search within.
return_array (list[list]): 2D range of values to return from.
threshold (float, optional): Minimum similarity score (0 to 1) required for a match. Default is 0.6.
fuzzy_algo (str, optional): The string comparison algorithm to use. Valid options: Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, Hamming. Default is 'levenshtein'.
Returns:
list[list]: 2D list of Data Types containing the lookup result, similarity score, and matched text. Returns "no match" if no result is found above the threshold.
"""
try:
def to2d(x):
if x is None: return [[""]]
return [[x]] if not isinstance(x, list) else x
def get_dt_type(val):
if isinstance(val, bool): return "Boolean"
if isinstance(val, (int, float)): return "Double"
return "String"
def get_similarity(s1, s2, algo):
s1, s2 = str(s1), str(s2)
if algo == "levenshtein":
dist = jellyfish.levenshtein_distance(s1, s2)
max_len = max(len(s1), len(s2))
return 1.0 - (dist / max_len) if max_len > 0 else 1.0
elif algo == "damerau_levenshtein":
dist = jellyfish.damerau_levenshtein_distance(s1, s2)
max_len = max(len(s1), len(s2))
return 1.0 - (dist / max_len) if max_len > 0 else 1.0
elif algo == "jaro":
return jellyfish.jaro_similarity(s1, s2)
elif algo == "jaro_winkler":
return jellyfish.jaro_winkler_similarity(s1, s2)
elif algo == "hamming":
if len(s1) != len(s2):
return 0.0
dist = jellyfish.hamming_distance(s1, s2)
return 1.0 - (dist / len(s1)) if len(s1) > 0 else 1.0
return 0.0
needles = to2d(lookup_value)
if not lookup_array or not return_array or len(lookup_array) != len(return_array) or len(lookup_array[0]) != len(return_array[0]):
return "Error: lookup_array and return_array must have the same dimensions."
haystack_flat = [item for sublist in lookup_array for item in sublist]
return_flat = [item for sublist in return_array for item in sublist]
results = []
for row in needles:
res_row = []
for needle in row:
if needle is None or str(needle).strip() == "":
res_row.append("")
continue
best_idx = -1
max_score = -1.0
matched_text = ""
for i, item in enumerate(haystack_flat):
if item is None:
continue
try:
score = float(get_similarity(needle, item, fuzzy_algo))
if score >= threshold and score > max_score:
max_score = score
best_idx = i
matched_text = str(item)
except Exception:
continue
if best_idx != -1:
val = return_flat[best_idx]
v_type = get_dt_type(val)
res_row.append({
"type": v_type,
"basicValue": val,
"properties": {
"Result": {"type": v_type, "basicValue": val},
"Score": {"type": "Double", "basicValue": float(max_score)},
"Match": {"type": "String", "basicValue": matched_text}
}
})
else:
res_row.append({
"type": "String",
"basicValue": "no match",
"properties": {
"Result": {"type": "String", "basicValue": ""},
"Score": {"type": "String", "basicValue": ""},
"Match": {"type": "String", "basicValue": ""}
}
})
results.append(res_row)
return results
except Exception as e:
return f"Error: {str(e)}"