How to compare the contents of a column of the same data table

Question

I have this table:

and I want to get the rows that are equal to the first three columns, like this:

I've tried these functions, but when I get the index of the lines, r doesn't give the output that I want:

df$obj<-sapply(c("sample1", "sample2", "sample3"), function (comparison) {
    df$sample1 == df[comparison]
})

following on from my dplyr suggestion on your previous question: filter(df, sample1 == sample2 & sample1 == sample3) Note that a and A are not the same. If you want a case-insensitive matching, use filter(df, toupper(sample1) == toupper(sample2) & toupper(sample1) == toupper(sample3)) — heathobrien, May 18 '18 at 16:40
but in that way you only get comparison between two columns, you see? you can have the same result between sample 1 and sample 2 , but when you are going to compare sample 1 and sample 3, the result can be different than the previous one — Sofia, May 18 '18 at 16:53
I'm not sure what you mean. If sample1 == sample2 and sample1 == sample3, then sample2 has to be equal to sample3. Unless I'm misunderstanding your goal. My code will turn your example input into your desired output though — heathobrien, May 18 '18 at 17:05

zx8754 · Answer 1 · 2018-05-22T10:36:30.467

4

Loop through 3 columns by row using apply, then convert toupper case (as we want a same as A), get unique, and get lengths, if length is 1 then all 3 values are equal:

df1[ lengths(
  apply(df1[, 1:3], 1, function(i) unique(toupper(i)))
  ) == 1, ]

This would scale better if we want to compare 4, 10, n(?) columns.

edited May 22 '18 at 10:36

answered May 21 '18 at 09:49

zx8754

1,042
8
22

score 3 · Answer 2 · edited May 22 '18 at 13:42

3

No need to use dplyr, we can do this using base, see below example:

# reproducible example data
df1 <- data.frame(
  t(data.frame(
    one = c("A", "A", "A", "C", "C", "C"),
    two = c("A", "A", "A", "A", "A", "C"), 
    thr = c("A", "A", "A", "A", "A", "A"), 
    fou = c("C", "A", "A", "A", "A", "A"))), 
  stringsAsFactors = FALSE)

df1
#     X1 X2 X3 X4 X5 X6
# one  A  A  A  C  C  C
# two  A  A  A  A  A  C
# thr  A  A  A  A  A  A
# fou  C  A  A  A  A  A

# solution, subset using logical index:
selection <- df1[df1[, 1] == df1[, 2] & df1[, 1] == df1[, 3], ]

selection
#     X1 X2 X3 X4 X5 X6
# one  A  A  A  C  C  C
# two  A  A  A  A  A  C
# thr  A  A  A  A  A  A

edited May 22 '18 at 13:42

zx8754

1,042
8
22

answered May 18 '18 at 16:44

benn

3,571
9
28

3

To be fair, you don't really need dplyr for anything. I just think it's easier to wrap your head around filter then a command that repeats the name of the data frame 5 times, along with 5 sets of square brackets and 5 commas. Personal preference though I guess. – heathobrien May 18 '18 at 17:00
@heathobrien, you are right dplyr is a great tool, and I think for some difficult tasks even necessary. For row selection with conditional statements I am used to select this (old fashion) way. But with dplyr would certainly be good as well. – benn May 18 '18 at 17:08
1

t(data.frame(…)) is an odd pattern, since t always converts data.frames to matrices, and assigning the result to a variable called df is actively misleading. – Konrad Rudolph May 22 '18 at 10:14
@KonradRudolph, that was to make a workable example. I understand that you don't like my style (also from previous comments), feel free to put your own answer. – benn May 22 '18 at 10:16
@b.nota Actually I have no problem with the style of this answer, I think it actually works well otherwise. But, to clarify, my previous comment wasn’t about style. – Konrad Rudolph May 22 '18 at 10:24
@b.nota hope you don't mind the edit. Generally I avoid function names (df). And converted your matrix to a dataframe by wrapping it with another dataframe. Your solution works just fine. – zx8754 May 22 '18 at 10:51

How to compare the contents of a column of the same data table

2 Answers2