4

I have this table:

enter image description here

and I want to get the rows that are equal to the first three columns, like this:

enter image description here

I've tried these functions, but when I get the index of the lines, r doesn't give the output that I want:

df$obj<-sapply(c("sample1", "sample2", "sample3"), function (comparison) {
    df$sample1 == df[comparison]
})
Jason Aller
  • 165
  • 2
  • 10
Sofia
  • 351
  • 2
  • 7
  • following on from my dplyr suggestion on your previous question: filter(df, sample1 == sample2 & sample1 == sample3) Note that a and A are not the same. If you want a case-insensitive matching, use filter(df, toupper(sample1) == toupper(sample2) & toupper(sample1) == toupper(sample3)) – heathobrien May 18 '18 at 16:40
  • but in that way you only get comparison between two columns, you see? you can have the same result between sample 1 and sample 2 , but when you are going to compare sample 1 and sample 3, the result can be different than the previous one – Sofia May 18 '18 at 16:53
  • I'm not sure what you mean. If sample1 == sample2 and sample1 == sample3, then sample2 has to be equal to sample3. Unless I'm misunderstanding your goal. My code will turn your example input into your desired output though – heathobrien May 18 '18 at 17:05

2 Answers2

4

Loop through 3 columns by row using apply, then convert toupper case (as we want a same as A), get unique, and get lengths, if length is 1 then all 3 values are equal:

df1[ lengths(
  apply(df1[, 1:3], 1, function(i) unique(toupper(i)))
  ) == 1, ]

This would scale better if we want to compare 4, 10, n(?) columns.

zx8754
  • 1,042
  • 8
  • 22
3

No need to use dplyr, we can do this using base, see below example:

# reproducible example data
df1 <- data.frame(
  t(data.frame(
    one = c("A", "A", "A", "C", "C", "C"),
    two = c("A", "A", "A", "A", "A", "C"), 
    thr = c("A", "A", "A", "A", "A", "A"), 
    fou = c("C", "A", "A", "A", "A", "A"))), 
  stringsAsFactors = FALSE)

df1
#     X1 X2 X3 X4 X5 X6
# one  A  A  A  C  C  C
# two  A  A  A  A  A  C
# thr  A  A  A  A  A  A
# fou  C  A  A  A  A  A

# solution, subset using logical index:
selection <- df1[df1[, 1] == df1[, 2] & df1[, 1] == df1[, 3], ]

selection
#     X1 X2 X3 X4 X5 X6
# one  A  A  A  C  C  C
# two  A  A  A  A  A  C
# thr  A  A  A  A  A  A
zx8754
  • 1,042
  • 8
  • 22
benn
  • 3,571
  • 9
  • 28
  • 3
    To be fair, you don't really need dplyr for anything. I just think it's easier to wrap your head around filter then a command that repeats the name of the data frame 5 times, along with 5 sets of square brackets and 5 commas. Personal preference though I guess. – heathobrien May 18 '18 at 17:00
  • @heathobrien, you are right dplyr is a great tool, and I think for some difficult tasks even necessary. For row selection with conditional statements I am used to select this (old fashion) way. But with dplyr would certainly be good as well. – benn May 18 '18 at 17:08
  • 1
    t(data.frame(…)) is an odd pattern, since t always converts data.frames to matrices, and assigning the result to a variable called df is actively misleading. – Konrad Rudolph May 22 '18 at 10:14
  • @KonradRudolph, that was to make a workable example. I understand that you don't like my style (also from previous comments), feel free to put your own answer. – benn May 22 '18 at 10:16
  • @b.nota Actually I have no problem with the style of this answer, I think it actually works well otherwise. But, to clarify, my previous comment wasn’t about style. – Konrad Rudolph May 22 '18 at 10:24
  • @b.nota hope you don't mind the edit. Generally I avoid function names (df). And converted your matrix to a dataframe by wrapping it with another dataframe. Your solution works just fine. – zx8754 May 22 '18 at 10:51