2

I want to do something VERY similar to this question: how to use merge() to update a table in R

but instead of just one column being the index, I want to match the new values on an arbitrary number of columns >=1.

foo <- data.frame(index1=c('a', 'b', 'b', 'd','e'),index2=c(1, 1, 2, 3, 2), value=c(100,NA, 101, NA, NA))

Which has the following values

foo
  index1 index2 value
1      a      1   100
2      b      1    NA
3      b      2   101
4      d      3    NA
5      e      2    NA

And the data frame bar

bar <- data.frame(index1=c('b', 'd'),index2=c(1,3), value=c(200, 201))

Which has the following values:

 bar
  index1 index2 value
1      b      1   200
2      d      3   201

merge(foo, bar, by='index', all=T) It results in this output:

Desired output:

foo
  index1 index2 value
1      a      1   100
2      b      1   200
3      b      2   101
4      d      3   201
5      e      2    NA
Community
  • 1
  • 1
wolfsatthedoor
  • 6,697
  • 16
  • 43
  • 85
  • Is there a reason `index2` is numeric for one data frame and a character for the other ? Also, do you want to merge based on both `index1` and `index2` ? – steveb Aug 18 '16 at 22:24
  • No sorry, Arun edited it , thanks. yes, merge based on both indices – wolfsatthedoor Aug 18 '16 at 22:59

4 Answers4

4

I think you don't need a merge but more to rbind and filter them later. Here I am using data.table for sugar syntax.

dx <- rbind(bar,foo)
library(data.table)
setDT(dx)
## note this can be applied to any number of index
setkeyv(dx,grep("index",names(dx),v=T))
## using unqiue to remove all duplicated 
## here it will remove the duplicated with missing values which is the 
## expected behavior
unique(dx)

#    index1 index2 value
# 1:      b      1   200
# 2:      b      2   101
# 3:      d      3   201
# 4:      a      1   100
# 5:      e      2    NA

you can be more explicit and filter your rows by group of indexs:

 dx[,ifelse(length(value)>1,value[!is.na(value)],value),key(dx)]
agstudy
  • 116,828
  • 17
  • 186
  • 250
1

Here's an R base approach

> temp <- merge(foo, bar, by=c("index1","index2"), all=TRUE)
> temp$value <- with(temp, ifelse(is.na(value.x) & is.na(value.y), NA, rowSums(temp[,3:4], na.rm=TRUE)))
> temp <- temp[, -c(3,4)]
> temp
  index1 index2 value
1      a      1   100
2      b      1   200
3      b      2   101
4      d      3   201
5      e      2    NA
Jilber Urbina
  • 53,125
  • 10
  • 108
  • 134
0

You can use some dplyr voodoo to produce what you want. The following subsets the data by unique combinations of "index1" and "index2", and checks the contents of "value" for each subset. If "value" has any non-NA values, those are returned. If only an NA value is found, that is returned.

Seems a little specific, but it seems to do what you want!

library(dplyr)

df.merged <- merge(foo, bar, all = T) %>% 
  group_by(index1, index2) %>% 
  do(
    if (any(!is.na(.$value))) {
      return(subset(., !is.na(value)))
    } else {
      return(.)
    }
  )

Output:

  index1 index2 value
  <fctr> <fctr> <dbl>
1      a      1   100
2      b      1   200
3      b      2   101
4      d      3   201
5      e      2    NA
jdobres
  • 10,107
  • 1
  • 15
  • 35
0

You can specify as many columns as you want with merge:

out <- merge(foo, bar, by=c("index1", "index2"), all.x=TRUE)
new <- apply(out[,3:4], 1, function(x) sum(x, na.rm=TRUE))
new <- ifelse(is.na(out[,3]) & is.na(out[,4]), NA, new)
out <- cbind(out[,1:2], new)
AidanGawronski
  • 1,965
  • 1
  • 12
  • 24