0

Disclaimer: I am a total R noob.

I have a data frame from which I want to remove rows based on a column value. For instance, in the following example, I want to remove the row that contain 'C' in col3.

> df <- data.frame(col1=c(1,2,3,4),col2=c(5,6,7,8),col3=c('A','B','C','D'))
> df
col1 col2 col3
1    1    5    A
2    2    6    B
3    3    7    C
4    4    8    D

I found out that this can be done with the subset() function.

> subdf <- subset(df,col3 != 'B')
> subdf
  col1 col2 col3
1    1    5    A
3    3    7    C
4    4    8    D

So far so good. However, I was surprise by the fact that when I plot the data from such subset (which is my ultimate goal with my real data), the 'B' value is still accounted for despite being completely absent for the created subset. This can also be observed when using the summary() function. 'B' is still there, but with a frequency of zero:

> summary(subdf$col3)
A B C D 
1 0 1 1 

I understand that this is for sure the expected behavior. Therefore, I would like to know if there is any way to 'unlink' the subset from the original data frame? Or is there another way to create a subset that would return a new, independent data frame?

I also tried with df[df['col3']!='B',], but it has the same behavior (which I am sure was totally predictable).

Alex
  • 399
  • 3
  • 7

0 Answers0