1

Assume there are two tables a and b.

Table a:

ID AGE
 1  20
 2  empty
 3  40
 4  empty

Table b:

ID AGE
 2  25
 4  45
 5  60

How to merge the two table in R so that the resulting table becomes:

ID AGE
 1  20
 2  25
 3  40
 4  45
Brian Tompsett - 汤莱恩
  • 5,438
  • 68
  • 55
  • 126
xyin
  • 417
  • 2
  • 7
  • 19

3 Answers3

2

Assuming you have NA on every position in the first table where you want to use the second table's age numbers you can use rbind and na.omit.

Example

x <- data.frame(ID=c(1,2,3,4), AGE=c(20,NA,40,NA))
y <- data.frame(ID=c(2,4,5), AGE=c(25,45,60))
na.omit(rbind(x,y))

Results in what you're after (although unordered and I assume you just forgot ID 5)

ID AGE
1  20
3  40
2  25
4  45
5  60

EDIT

If you want to merge two different data.frames's and keep the columns its a different thing. You can use merge to achieve this.

Here are two data frames with different columns:

x <- data.frame(ID=c(1,2,3,4), AGE=c(20,NA,40,NA), COUNTY=c(1,2,3,4))
y <- data.frame(ID=c(2,4,5), AGE=c(25,45,60), STATE=c('CA','CA','IL'))

Add them together into one data.frame

res <- merge(x, y, by='ID', all=T)

giving us

ID AGE.x COUNTY AGE.y STATE
 1    20      1    NA  <NA>
 2    NA      2    25    CA
 3    40      3    NA  <NA>
 4    NA      4    45    CA
 5    NA     NA    60    IL

Then massage it into the form we want

idx <- which(is.na(res$AGE.x))                     # find missing rows in x
res$AGE.x[idx] <- res$AGE.y[idx]                   # replace them with y's values
names(res)[agrep('AGE\\.x', names(res))] <- 'AGE'  # rename merged column AGE.x to AGE
subset(res, select=-AGE.y)                         # dump the AGE.y column

Which gives us

ID AGE COUNTY STATE
 1  20      1  <NA>
 2  25      2    CA
 3  40      3  <NA>
 4  45      4    CA
 5  60     NA    IL
while
  • 3,402
  • 3
  • 30
  • 42
  • Thanks for the idea! Another question is: If table a and table b both have another column which is not shared in common, is there anyway to do the merge, fill the empty cells, and keep both extra columns in the resulting table? For example, table a has an extra column of STATE, table b has an extra column of COUNTY? Thanks! – xyin Apr 17 '15 at 15:43
  • Sure but then this wouldn't work. Ill add an edit with another solution to that problem. – while Apr 17 '15 at 15:52
2

You could try

library(data.table)
setkey(setDT(a), ID)[b, AGE:= i.AGE][]
#   ID AGE
#1:  1  20
#2:  2  25
#3:  3  40
#4:  4  45

data

a <- data.frame(ID=c(1,2,3,4), AGE=c(20,NA,40,NA))
b <- data.frame(ID=c(2,4,5), AGE=c(25,45,60))
akrun
  • 789,025
  • 32
  • 460
  • 575
1

The package in the other answer will work. Here is a dirty hack if you don't want to use the package:

x$AGE[is.na(x$AGE)] <- y$AGE[y$ID %in% x$ID]
> x
  ID AGE
1  1  20
2  2  25
3  3  40
4  4  45

But, I would use the package to avoid the clunky code.

Pierre L
  • 27,528
  • 5
  • 43
  • 64