2

I need to compare two dataframes that explain the same things, but they came from different ways to obtain them.

So I need to obtain a df where every single value is compared to the respective on the other df, and give to me TRUE if values are identical, FALSE if they aren't.

I write an example just for better explain:

df1

>    1  2  3  
> 1 AT GC CC 
> 2 AG GC CT 
> 3 GG TT <NA>

df2

>    1  2   3  
> 1 AT <NA> GG 
> 2 AG  GC  CG 
> 3 GG  TT  AA

result

>      1     2     3  
> 1 TRUE <NA>  FALSE 
> 2 TRUE TRUE  FALSE 
> 3 TRUE TRUE  <NA>

I've seen here a result

Comparing two similar dataframes and finding different values between them

but in my df doesn't work if one of the df has an NA (R gave me TRUE).

Also, I aspected that if I change the order of the df in mapply(), I will obtain the same result, but it's not true in my case. The dataframes also have different levels, so df1==df2 doesn't work.

I also will ask to you how I will count the FALSE in the result. Is there something like is.na()?

thank you all

Community
  • 1
  • 1
mppd
  • 57
  • 1
  • 8

2 Answers2

2

We can just use == to get a logical matrix

(df1 == df2) & !is.na(df1) & !is.na(df2)
#    1     2     3
#1 TRUE FALSE FALSE
#2 TRUE  TRUE FALSE
#3 TRUE  TRUE FALSE

If the columns are factor class, then we can compare colwise with mapply/Map

mapply(function(x, y) {i1 <- as.character(x)==as.character(y)
         replace(i1, is.na(i1), FALSE)}, df1, df2)

Or compare as matrix and then convert the NA to FALSE

m1 <- as.matrix(df1) == as.matrix(df2)
m1[is.na(m1)] <- FALSE
akrun
  • 789,025
  • 32
  • 460
  • 575
  • I've got this error: `Error in Ops.factor(left, right) : level sets of factors are different` – mppd May 02 '17 at 11:57
  • @mppd It is better to compare `character` class columns, `i.e. `df1[] – akrun May 02 '17 at 11:59
  • I've seen that if in one of the df I've NA, in the end I've NA, but also if it's a false statement. How can I maintain a NA if in df1 is NA and in df2 not and vice-versa, and also obtain a FALSE statement if AT in df 1 is different then AA in df2, for example? – mppd May 04 '17 at 13:19
  • @mppd In that case, you just remove one of the `is.na` i.e. `(df1 == df2) & !is.na(df2)` – akrun May 04 '17 at 13:33
  • Last question and I'm done! (: what if there's `` in df1 and `` in df2 and I want to see `` too at the end? – mppd May 05 '17 at 08:33
  • @mppd In that case, just do `(df1 == df2)` – akrun May 05 '17 at 08:35
  • 1
    Very very very helpful! You made my day (and my job). Thank you! – mppd May 05 '17 at 08:49
1

Another possible option,

df1 == replace(df2, is.na(df2), 'NA')

or If both data frames contain NAs,

replace(df1, is.na(df1), 'NA') == replace(df2, is.na(df2), 'NA')
Sotos
  • 47,396
  • 5
  • 31
  • 61