2

Assume the following datatable:

DT <- data.table(a=c(1,2,3,4,5,6),b=c(NaN,NaN,NaN,4,5,6),c=c(NaN,3,3,3,NaN,NaN))

How can I replace all NaN values by NA, i.e. in each column? Looking for an elegant (short) way to do this, but I am also wondering if it is possible using lapply, like I tried below.

My approach so far:

DT[,lapply(SD,function(x){x[is.nan(x)] := NA}),.SDcols=c("a","b","c")]  

Actual result of my code is:

Error in :=(x[is.nan(x)], NA) : Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").

markus
  • 24,556
  • 5
  • 34
  • 51
User878239
  • 539
  • 4
  • 13
  • 1
    See end of Matt Dowle's answer here: https://stackoverflow.com/a/7249454/8583393 You would need `for(j in names(DT)) { set(DT, which(is.nan(DT[[j]])), j, NA) }` – markus Jan 09 '19 at 21:21
  • @markus thanks, that's it. I didn't see this post before, thanks. Need to use a good old for loop with `set` it seems :) – User878239 Jan 09 '19 at 21:25
  • 1
    Glad it worked. `set` is actually super fast so don't get 'blinded' by the loop. – markus Jan 09 '19 at 21:26

2 Answers2

4

You can do it easily using dplyr's mutate_all function.

DT <- DT %>% mutate_all(~ifelse(is.nan(.), NA, .))
print(DT)
#a  b  c
# 1 NA NA
# 2 NA  3
# 3 NA  3
# 4  4  3
# 5  5 NA
# 6  6 NA

The code above is equivalent to

DT %>% mutate_all(function(x) ifelse(is.nan(x), NA, x))

Here's one way to do it using data.table syntax. I'm not sure if there is a simpler way.

DT[, names(DT) := lapply(.SD, function(x) ifelse(is.nan(x), NA, x))]

EDIT: An alternative way to do this, as posted by markus in the comments beow.

DT[, lapply(.SD, function(x) replace(x, is.nan(x), NA))]
Dave Rosenman
  • 1,092
  • 8
  • 12
  • 1
    thanks, but isn't that `dplyr` syntax? I'd like to stick to datatable if possible just for one operation. – User878239 Jan 09 '19 at 21:19
  • I edited my response with a way to do it using data.table syntax. I mostly use dplyr, so I'm not sure if there's a simpler way to do it using data.table than the way I posted. – Dave Rosenman Jan 09 '19 at 21:27
  • thanks, this new approach works, but I also want to point out to markus solution in the comments. I will accept it for your efforts and because it works. – User878239 Jan 09 '19 at 21:43
  • 1
    @DavidRosenman The advantage of using `set` is that it does the replacement [by reference](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html). With your solution you'd need to do `DT – markus Jan 09 '19 at 21:56
2

You want something like this:

DT[DT == "NaN"] <- NA

This should ideally find all the NaN that you have and replace it with NA. If anyone knows a better way please let me know!

Hope this helps!

Lasarus9
  • 69
  • 9
  • 1
    thanks, but my NaNs are not characters so your code does not work: Error in `[.data.table`(DT, DT == "NaN") : i is invalid type (matrix) – User878239 Jan 09 '19 at 21:17
  • 1
    In addition, NaN does not equate to true with any value, including with itself. See `is.nan` for the proper test. Otherwise, you have the right idea. – Matthew Lundberg Jan 09 '19 at 23:25