0

I have three datasets that I want to join in order to create a test set for being used in a supervised machine learning algorithm. The problem is that although they have some variables in common, they generally differ in number of rows and elements. I have tried to use merge() function, but however, the more I use it, the lesser number of rows I get. And at the end, I get a small dataset with a ridiculous number of rows.

I have these three datasets:

test_review   nºrows 22956
test_business nrows  1205
test_user     nrows  5105

I want to keep the original number of reviews from test_review dataset (22956) for the ultimate test_set. The idea is that the business or user that has no coincidence at the time using merge() with the review_set,it appears as Na value in the corresponding new column as a result of merging both datasets. Is there any way to make possible this?

Frank
  • 65,012
  • 8
  • 95
  • 173
Roy
  • 19
  • 4

1 Answers1

0

you can try

library(plyr)
rbind.fill(test_review,test_business,test_user)
moodymudskipper
  • 42,696
  • 10
  • 102
  • 146