Is there a package that contains Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch from MiscPsycho is too slow for this.
Asked
Active
Viewed 1.8k times
31
4 Answers
20
And stringdist in the stringdist package does it too, even faster than levenshteinDist under certain conditions (1)
Ben
- 40,397
- 18
- 126
- 218
-
3stringdist has sped up significantly since that blog you link to: it now uses multiple cores. – Feb 26 '16 at 17:02
17
levenshteinDist (from the RecordLinkage package) calls compiled C code. Give it a try.
MichaelChirico
- 32,615
- 13
- 106
- 186
George Dontas
- 28,739
- 18
- 104
- 145
-
2Just noting the RecordLinkage package is apparently no longer maintained and has been pulled from CRAN. The `stringdist` package is the solution now. – Brian Stamper Feb 27 '20 at 17:42
6
You could try stringDist from Biostrings as well
MichaelChirico
- 32,615
- 13
- 106
- 186
Aaron Statham
- 1,988
- 1
- 14
- 16
1
You could also use levenshtein_distance() from the textTinyR package. I got 'calloc' memory errors with all other packages when it came to larger character vectors of around 30k characters. Only textTinyR worked for me!
interrobang
- 83
- 1
- 7