Partial match strings from two columns in R

Question

I have two data-frames in R, each with a column for an address (character). These strings aren't an exact match in most cases but are fairly close, for example, "510 East Bonham St", "510 e bonham". I'd like to be able to merge these datasets on partially matched strings (or full matches, if any) for the two columns.

I used gsub and got the columns to a slightly better level of comparability, i.e, removing commas, periods, lowercasing etc.

I have tried %like%, which works fairly well but often picks up a seemingly trivial pattern in the first string and matches it to nearly every string in the column from the second dataset.

Additionally, both data-frames have columns for the city, county, and state of these observations and I used these columns in conditional statements to give me some additional robustness. For example, %like% might match a string to, say, 10 address observations from the second data-frame, then I would filter such that the city, county, state variables (cleaned) were equal (%in%). This gave me a really small number of matched observations which I know shouldn't be the case.

Tried fuzzyjoin as well with similar results and am unsure what I am doing wrong.

Is there any function or package that can help with this task, or perhaps another way to go about this?

Thanks!

Another post asked a similar question, however, the context was picking a notion of a better address column between two columns.

before reopening, please provide a [minimal reproducible example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample data, e.g. with `dput()` and use the [reprex-package](https://reprex.tidyverse.org/). — mnist, Nov 29 '21 at 22:03

Partial match strings from two columns in R

0 Answers0