-1

Hello I have a df such as

COL1 COL2 COL3           COL4
NA   NA   Sp_canis_lupus 10
3    8    Sp_canis_lupus 10
3    8    Sp_canis_lupus 10 

How can I remove duplicate rows in COL3 and keep the last row ?

Here I should get :

COL1 COL2 COL3           COL4
3    8    Sp_canis_lupus 10 

Thank you very much for your help

Ruben Helsloot
  • 11,812
  • 5
  • 19
  • 40
chippycentra
  • 2,838
  • 1
  • 5
  • 14

3 Answers3

3

You could also solve this with aggregate, like below:

aggregate(. ~ COL3, data = df, FUN = tail, 1)

Or another way in dplyr:

library(dplyr)

df %>%
  group_by(COL3) %>%
  slice(n())

This of course assumes that you're only after duplicates in COL3 - otherwise you'll need to rephrase the problem (as the example doesn't seem to be particularly complex).

arg0naut91
  • 13,319
  • 1
  • 13
  • 34
1

Use duplicated to find duplicates - and then select those that are not duplicated, i.e. x[!duplicated(x), ]. You may need to make the statement a bit more elaborate given that you have NAs in there.

Roman Luštrik
  • 67,056
  • 24
  • 151
  • 191
1

Using dplyr:

df %>% 
 group_by(COL3) %>%
 filter(row_numer() == n() )

Upvote if it helps thanks!