64

I have a data frame and I want to remove last N rows from it. If I want to remove 5 rows, I currently use the following command, which in my opinion is rather convoluted:

df<- df[-seq(nrow(df),nrow(df)-4),]

How would you accomplish task, is there a convenient function that I can use in R?

In unix, I would use:

tac file | sed '1,5d' | tac 
zx8754
  • 46,390
  • 10
  • 104
  • 180
Alby
  • 5,212
  • 6
  • 38
  • 50

4 Answers4

97

head with a negative index is convenient for this...

df <- data.frame( a = 1:10 )
head(df,-5)
#  a
#1 1
#2 2
#3 3
#4 4
#5 5

p.s. your seq() example may be written slightly less(?) awkwardly using the named arguments by and length.out (shortened to len) like this -seq(nrow(df),by=-1,len=5).

Simon O'Hanlon
  • 56,833
  • 13
  • 136
  • 180
  • 1
    There's an edge case! `head(df, -0) == head(df,0) != df` – peer Nov 23 '18 at 13:41
  • @peer sorry, I don't think I understand your comment. Can you illustrate the edge case more fully? – Simon O'Hanlon Nov 23 '18 at 15:41
  • 4
    I'm switching from `df[0:(nrow(df)-n),]` to `head`. In my case the user moves a slider to indicate `n` last rows are to be removed. But there's a catch! When the user sets `n=0` we would expect no rows to be removed. But with `head(df, -n)` all rows are removed because negative zero is resolved to positive zero -> take the first 0 rows. So I want to warn others who set `n` dynamically and allow `n=0`: You'll need `if (n > 0) df=head(df, -n)` – peer Nov 23 '18 at 17:33
26

This one takes one more line, but is far more readable:

n<-dim(df)[1]
df<-df[1:(n-5),]

Of course, you can do it in one line by sticking the dim command directly into the re-assignment statement. I assume this is part of a reproducible script, and you can retrace your steps... Otherwise, strongly recommend in such cases to save to a different variable (e.g., df2) and then remove the redundant copy only after you're sure you got what you wanted.

Assaf
  • 515
  • 5
  • 6
  • 7
    While the `head` solution is probably preferable, you could also use `nrow(df)` instead of `dim(df)[1]`. – thelatemail Jan 16 '14 at 00:31
  • 3
    intuitive one-liner based on your suggestion: ``d – PatrickT Oct 31 '17 at 15:52
  • This solution actually removes the rownames of the data frame for me, while the accepted answer (using `head()`) doesn't, so I would not recommend this option. – Brunox13 May 17 '21 at 18:30
22

Adding a dplyr answer for completeness:

test_df <- data_frame(a = c(1,2,3,4,5,6,7,8,9,10), 
                      b = c("a","b","c","d","e","f","g","h","i","j"))
slice(test_df, 1:(n()-5))

## A tibble: 5 x 2
#      a b    
#  <dbl> <chr>
#1     1 a    
#2     2 b    
#3     3 c    
#4     4 d    
#5     5 e    
Oscar
  • 331
  • 3
  • 5
19

Another dplyr answer which is even more readable:

df %>% filter(row_number() <= n()-5)
Edgar
  • 402
  • 1
  • 6
  • 12