-1

There are several questions & answers about this topic; however, none seem to directly answer the question or I cannot seem to locate it. I appreciate the help in advance!

I have two data frames

df1 <- write.csv("df1.csv")
df2 <- write.csv("df2.csv")

I want to make

df3 <- data.frame([df1$LikeColumn != df2$LikeColumn],)

My goal is to make a data frame (df3) that consists of all observations (rows) where the two "LikeColumn" values are not equal.

Notes: The headers are the same (df1$x header is the same as df2$x) There are the same number of columns There are not the same number of rows

Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
Matt
  • 37
  • 1
  • 8
  • So you want only the rows that are in one data frame, but not in both? Or only the rows from `df1` that are not in `df2`? A small toy example with desired output would help us understand your goal and give us something to demonstrate code on. – Gregor Thomas Feb 28 '18 at 19:39
  • only rows from df1 that are not in both – Matt Feb 28 '18 at 19:40
  • 2
    Can you make a reproducible example? http://reprex.tidyverse.org/articles/reprex.html – tonyk Feb 28 '18 at 19:40
  • df1 has 3 million records df2 has 12 million records i need those df1 records that are not in df2 – Matt Feb 28 '18 at 19:40
  • 2
    `df1[! df1$LikeColumn %in% df2$LikeColumn, ]`, or with `dplyr::anti_join(df1, df2, by = "LikeColumn")`. – Gregor Thomas Feb 28 '18 at 19:41
  • Possibly anti_join? https://www.rdocumentation.org/packages/dplyr/versions/0.7.3/topics/join – tonyk Feb 28 '18 at 19:42
  • 1
    This code makes no sense: `df1 – IRTFM Feb 28 '18 at 19:42
  • Possible duplicate of [Find complement of a data frame (anti - join)](https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join) – Maurits Evers Feb 28 '18 at 22:16

1 Answers1

0

Using base R:

df1[! df1$LikeColumn %in% df2$LikeColumn, ]

With dplyr

library(dplyr)
anti_join(df1, df2, by = "LikeColumn")

This question is closely related: Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2, but it focuses on finding full rows, whereas in this case we are only looking at values in a single column.

Also see Find complement of a data frame, which has a data.table solution which will be most efficient if you have large data and convert them to keyed data tables.

Gregor Thomas
  • 119,032
  • 17
  • 152
  • 277
  • Thanks for your help! I will look at the suggested threads. Please note that df1[! df1$LikeColumn %in% df2$LikeColumn, ] is only giving me the count in the console. It is not creating data frame @Gregor – Matt Feb 28 '18 at 21:11
  • The code I posted should work fine based on your description. If you provide a reproducible example as tonyk suggested in comments, then we can actually test and demonstrate the code (and see anything that might be weird with your data structures). Or maybe you just need to assign the result? `df3 – Gregor Thomas Feb 28 '18 at 21:39
  • i cant thank you enough first off! I am using Rstudio - i am referencing the console output window. When i run df1[! df1$LikeColumn %in% df2$LikeColumn, ] the console window displays the results but does not create a dataframe. when i run df1 – Matt Mar 01 '18 at 15:22
  • `df1[! df1$LikeColumn %in% df2$LikeColumn, ]` creates a data frame, it just doesn't assign it to anything to keep it around. If you want to assign that value it to a new name, like `df3`, then do `df3 – Gregor Thomas Mar 01 '18 at 15:24
  • `foo + 1` will print 3, but not assign it to anything. If you want to change `foo` to increase its value by 1, then you do `foo – Gregor Thomas Mar 01 '18 at 15:27
  • In my answer, I just give the expression for the result, I didn't assign the value to anything because I don't know what you want to name the result. You can call it whatever you want. `whatever_you_want – Gregor Thomas Mar 01 '18 at 15:30
  • got it! thanks! i was missing the " – Matt Mar 01 '18 at 15:31
  • To be pedantic, you were missing the `df3 – Gregor Thomas Mar 01 '18 at 15:31