0

I have a data frame like:

       Domain         Phylum          Class          Order
ID_1 Bacteria  Cyanobacteria Unclassified_c Unclassified_o
ID_2 Bacteria  Cyanobacteria Unclassified_c Unclassified_o
ID_3 Bacteria  Bacteroidetes Unclassified_c Unclassified_o
ID_4 Bacteria Proteobacteria Unclassified_c Unclassified_o
ID_5 Bacteria  Bacteroidetes Unclassified_c Unclassified_o

and I want to replace all the character Unclassified_c, Unclassified_o, elment_3, etc, for NA, so I had tried:

df[df == "Unclassified_c" ] <- NA

this work well if I use one by one value, but sometimes could be to many; So I will like to try something like a list of patterns and then use it, something like:

Remove_list <- ("Unclassified_c", "Unclassified_o", "element_3", "element_4", "element_x") 

and then use the list to replace for NA:

df[ df == Remove_list ] <- NA 

It change to NA some of the values but not all. I don't want to use stringr library, because it eliminate the rownames (ID_1 .. ID_x) and I need it, so I will like to try Rbase, any suggestion

Thanks so much !!!!

Jaap
  • 77,147
  • 31
  • 174
  • 185
abraham
  • 491
  • 5
  • 11

1 Answers1

3

We can use sapply with %in% which returns logical matrix of whether a value is present in Remove_list or not. We can assign NA for TRUE values.

df[sapply(df, `%in%`, Remove_list)] <- NA

df
#       Domain         Phylum Class Order
#ID_1 Bacteria  Cyanobacteria  <NA>  <NA>
#ID_2 Bacteria  Cyanobacteria  <NA>  <NA>
#ID_3 Bacteria  Bacteroidetes  <NA>  <NA>
#ID_4 Bacteria Proteobacteria  <NA>  <NA>
#ID_5 Bacteria  Bacteroidetes  <NA>  <NA>
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178