1

I am using merge function on two data frames A and B

nrow(A) <- 11537
nrow(B) <- 734

But when I apply merge function as follows:

m <- merge(A,B,all.x=TRUE,by="id")

nrow(m) <- 29730

I get "m" with 29730 rows. "m" should have 11537 rows only as I am merging B into A. I am not able to identify reasons behind this. Can somebody please help me? What is getting added in "A"?

File is big, I cannot check manually.

Ayush Raj Singh
  • 833
  • 4
  • 15
  • 20

1 Answers1

2

If your id values aren't unique in each data.frame, then every combination of possible matches is created in the result. for example:

a = data.frame(id=c(1,1,1,2,2),val=1:5)
b = data.frame(id=c(1,1,3,2,2),valb=11:15)
m = merge(a,b,by="id",all.x=T)

m will have 10 rows - 6 with id=1 and 4 with id=2

My guess is this what causes your merged data.frame to become bigger than expected.

amit
  • 3,158
  • 4
  • 22
  • 30
  • Hi @amit, thank you. I got the problem. I have put a question related to this. http://stackoverflow.com/questions/17297021/fromjson-is-not-able-to-convert-some-entries-in-csv-format-and-showing-name can you look it up and see if you could help me? Thank you. – Ayush Raj Singh Jun 25 '13 at 18:44