2

My colleague and I are collecting data on separate files and we want to reconcile the data. Our data files look like so:

df1 = data.frame(Id = c(1:5), Score= c(10,NA,4,NA,3))
df2 = data.frame(UserID= c(1:5), Result= c(NA,8,NA,3,NA))

What is the simplest way to merge the two to form the following dataset to attain the following result?

df3 = data.frame(Id= c(1:5), Score= c(10,8,4,3,3))

Changing column names and using merge() don't seem to work which is what I had hoped.

Any suggestions? Would the quickest to be run a for loop across both datasets?

timnus
  • 93
  • 6

4 Answers4

2

you are looking for the function coalesce in the library dplyr:

dplyr::coalesce(df1,df2)
  Id Score
1  1    10
2  2     8
3  3     4
4  4     3
5  5     3
onyambu
  • 49,350
  • 3
  • 19
  • 45
1

use the dplyr library

library(dplyr)


inner_join(df1, df2, by = c("Id" = "UserID")) %>%  #specify the column names to join
  mutate(Score = ifelse (is.na(Score), Result, Score)) %>% #This will fill all the NA Scores with values of Result
  select(Id, Score) #Finally, Select the two columns you desire

Hope that helps

Wally Ali
  • 2,470
  • 1
  • 12
  • 19
0

Assuming that any NA in one df corresponds to a value in the other df, the simplest solution is by using an ifelse() statement to insert the values from df2 inlieu of the NA in df1, thus:

df1$Score <- ifelse(is.na(df1$Score), df2$Result, df1$Score)

This says: if df1$Score is NA, write the value of df2$Result, otherwise keep df1$Score

Chris Ruehlemann
  • 15,379
  • 3
  • 11
  • 27
0

With base R this should work as well:

df3 <- merge(df1, df2, by.x = c("Id", "Score"), by.y = c("UserID", "Result"), all = T)
df3 <- df3[!is.na(df3$Score),]
df3
Lennyy
  • 5,754
  • 2
  • 8
  • 22