144

I realize that dplyr v3.0 allows you to join on different variables:

left_join(x, y, by = c("a" = "b") will match x.a to y.b

However, is it possible to join on a combination of variables or do I have to add a composite key beforehand?

Something like this:

left_join(x, y, by = c("a c" = "b d") to match the concatenation of [x.a and x.c] to [y.b and y.d]

Mus
  • 6,790
  • 22
  • 78
  • 117
JasonAizkalns
  • 19,278
  • 6
  • 52
  • 107

1 Answers1

257

Updating to use tibble()

You can pass a named vector of length greater than 1 to the by argument of left_join():

library(dplyr)

d1 <- tibble(
  x = letters[1:3],
  y = LETTERS[1:3],
  a = rnorm(3)
  )

d2 <- tibble(
  x2 = letters[3:1],
  y2 = LETTERS[3:1],
  b = rnorm(3)
  )

left_join(d1, d2, by = c("x" = "x2", "y" = "y2"))
davechilders
  • 8,067
  • 2
  • 16
  • 18
  • 7
    Thanks for this; also works when the columns in the data frames have the same name, e.g. `left_join(d1, d2, by = c("firstname" = "firstname", "lastname" = "lastname"))`. May not be obvious to some. – Anthony Simon Mielniczuk Jan 27 '18 at 14:41
  • 23
    When the join columns are the same, you can also avoid the `=`: `left_join(d1, d2, by = c("firstname", "lastname"))` – davechilders Jan 27 '18 at 19:06
  • 3
    Ooof... I was holding out home, but... this appears to be an AND... which I suppose makes sense but I was hoping it'd be an x=x2 OR y=y2, as I have multiple indexes built to try to identify duplicate but damaged entries across disparate resources. – Joshua Eric Turcotte Aug 21 '18 at 18:48
  • The names don't have to be the same, the should just be valid column names in the corresponding dataframe i.e one can have a column "fname" and the other "firstname" and will work just fine. – San Emmanuel James Jul 12 '20 at 02:10