0

I've been using this VBA solution by smirkingman from another similar question for calculating Levenshtein distance between strings. I have a need to translate this to an M code function in Excel Power Query, but don't have the know-how to do so.

Hoping someone can help me out. The 3 basic transformations between strings used in Levenshtein distance are below. Each counts as 1 step. More steps = greater distance between strings.

  • Insertion
  • Deletion
  • Substitution

I thought I could "cheat" and not use a For loop-type structure as shown in the VBA example, but the test results below show that I need a more robust solution.

let
    result = (s1 as text, s2 as text) as number => 
        List.Max({Text.Length(s1),Text.Length(s2)}) - List.Count(List.Intersect({Text.ToList(s1), Text.ToList(s2)}))
in
    result

Test Results

s1 s2 result explanation
pale pole 1 substitution
dole sale 2 substitution (x2)
pool spool 1 insertion
two one 2 (incorrect) substitution and/or insert/delete (3 steps min) EXPECTED: 3
dholt
  • 23
  • 5
  • If Python is an option, go with this. https://stackoverflow.com/questions/66266748/how-can-we-add-a-field-in-a-data-frame-to-find-the-distance-between-latitude-l I've used it in the past. It works fantastic!!! – ASH Dec 10 '21 at 04:44
  • If your goal is just match strings that are similar, I've found the various "fuzzy" routines of power query to be effective. – Ron Rosenfeld Dec 10 '21 at 12:02
  • @RonRosenfeld I have only seen fuzzy matching give strings a "Similarity" score, which I'm not sure uses Levenshtein distance. I not only need to determine distance between strings, I need to count the number of strings with certain distances (e.g. 1, 2, 3 steps from s2). – dholt Dec 10 '21 at 13:35
  • It may use it as part of the fuzzy algorithm, but it won't serve your purposes, I think. You can use Python scripts in Power BI desktop /power query but I don't know about using it in Excel/power query. Or you can use `List.Generate` to loop. – Ron Rosenfeld Dec 10 '21 at 14:12

0 Answers0