32

There is a similar question for PHP, but I'm working with R and am unable to translate the solution to my problem.

I have this data frame with 10 rows and 50 columns, where some of the rows are absolutely identical. If I use unique on it, I get one row per - let's say - "type", but what I actually want is to get only those rows which only appear once. Does anyone know how I can achieve this?

I can have a look at clusters and heatmaps to sort it out manually, but I have bigger data frames than the one mentioned above (with up to 100 rows) where this gets a bit tricky.

Jaap
  • 77,147
  • 31
  • 174
  • 185
Lilith-Elina
  • 1,454
  • 4
  • 18
  • 30

3 Answers3

81

This will extract the rows which appear only once (assuming your data frame is named df):

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

Sven Hohenstein
  • 78,180
  • 16
  • 134
  • 160
10

A possibility involving dplyr could be:

df %>%
 group_by_all() %>%
 filter(n() == 1)

Or:

df %>%
 group_by_all() %>%
 filter(!any(row_number() > 1))

Since dplyr 1.0.0, the preferable way would be:

data %>%
    group_by(across(everything())) %>%
    filter(n() == 1)
tmfmnk
  • 36,341
  • 4
  • 40
  • 53
1

Try it

library(dplyr)

DF1 <- data.frame(Part = c(1,2,3,4,5), Age = c(23,34,23,25,24),  B.P = c(87,76,75,75,78))

DF2 <- data.frame(Part =c(3,5), Age = c(23,24), B.P = c(75,78))

DF3 <- rbind(DF1,DF2)

DF3 <- DF3[!(duplicated(DF3) | duplicated(DF3, fromLast = TRUE)), ]
Brutalroot
  • 176
  • 1
  • 7