0

I have two data frames that look like this:

test_a

x                y
2018-01-326      A
2018-01-457      B
2018-01-6557     C

test_b

z                    data
1-2018-01-326_S35    1
2-2018-01-457_S45    2
3-2018-01-6557_S77   3

I would like to merge the two data frames based on the first two columns x and z. I can do this by cleaning the names in the z column to match x, but I would like to know if it is possible to match and merge by identifying partial matches from x in z. This would then work for more complex data too, where cleaning is harder to achieve.

Desired output:

z                    x              data   y
1-2018-01-326_S35    2018-01-326    1      A
2-2018-01-457_S45    2018-01-457    2      B
3-2018-01-6557_S77   2018-01-6557   3      C

Testdata:

structure(list(x = c("2018-01-326", "2018-01-457", "2018-01-6557"
), y = c("A", "B", "C")), class = "data.frame", row.names = c(NA, 
-3L))

structure(list(z = c("1-2018-01-326_S35", "2_2018-01-457_S45", 
"3-2018-01-6557_S77"), data = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L))
Haakonkas
  • 893
  • 8
  • 22
  • 1
    Have a look at the [fuzzyjoin](https://cran.r-project.org/web/packages/fuzzyjoin/fuzzyjoin.pdf) library. And [this](https://stackoverflow.com/questions/61000838/fuzzy-left-join-with-r) question please. – maydin Dec 14 '21 at 12:50

0 Answers0