0

Trying to merge two data frames, using a variable called hash_id. For some reason R does not recognize the hash-id's in one of the data frames, while it does so in the other.

I have checked and I just don't get it. See below how I checked:

> head(df1[46],1) # so I take the first 'hash-id' from df1
#    hash_id
# 1 abab123123

> which(df2 == "abab123123", arr.ind=TRUE) # here it shows that row 6847 contains a match
#      row col
# [1,] 6847  32`

> which(df1 == "abab123123", arr.ind=TRUE) # and here there is NO matching value!
#     row col
# 
10 Rep
  • 2,156
  • 7
  • 17
  • 31
Thieme Hennis
  • 545
  • 2
  • 9
  • 20
  • 1
    Could you show some reproducible example using `dput`. For ex. `dput(head(df1,20)` or subset the concerned rows and dput it. Also, if it a dataframe with multiple columns. It is better to use `df1[,"colName"]=='abab123123'` – akrun Sep 19 '14 at 08:16

1 Answers1

1

One possibility is trailing or leading spaces in the concerned columns for one of the datasets. You could do:

library(stringr)
df1[, "hash_id"] <- str_trim(df1[,"hash_id"])
df2[, "hash_id"] <- str_trim(df2[, "hash_id"])

which(df1[, "hash_id"]=="abab123123", arr.ind=TRUE)
which(df2[, "hash_id"]=="abab123123", arr.ind=TRUE)

Another way would be use grep

grepl("\\babab123123\\b", df1[,"hash_id"])
grepl("\\babab123123\\b", df2[,"hash_id"])        
akrun
  • 789,025
  • 32
  • 460
  • 575