0

My dummy data looks like this:

> head(dummy)
            C1          C2
[1,]         1           1
[2,]         1           2
[3,]         1           3
[4,]         2           3
[5,]         2           4
[6,]         2           5

Value 3 is duplicated in C2, but those lines are unique in data frame. I want to remove all duplicates according C2 and keep only first/last occurrence according C1.

Example of what I want:

> remove duplicates leave first in C1
            C1          C2
[1,]         1           1
[2,]         1           2
[3,]         1           3
[5,]         2           4
[6,]         2           5
# filtered    [4,]   2    3

Or

> remove duplicates leave first in C1
            C1          C2
[1,]         1           1
[2,]         1           2
[4,]         2           3
[5,]         2           4
[6,]         2           5
# filtered   [3,]   1    3
Jaap
  • 77,147
  • 31
  • 174
  • 185
pogibas
  • 25,773
  • 19
  • 74
  • 108

1 Answers1

1

if dat is the dataset

dat[with(dat, !duplicated(C2)),]
 C1 C2
1  1  1
2  1  2
3  1  3
5  2  4
6  2  5


dat[with(dat, !duplicated(C2,fromLast=TRUE)),]
akrun
  • 789,025
  • 32
  • 460
  • 575