107

Basically, I'm doing some data analysis. I read in a dataset as a numpy.ndarray and some of the values are missing (either by just not being there, being NaN, or by being a string written "NA").

I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?

zebra
  • 6,043
  • 19
  • 56
  • 66

1 Answers1

179
>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
array([[  1.,   2.,   3.],
       [  4.,   5.,  nan],
       [  7.,   8.,   9.]])

>>> a[~np.isnan(a).any(axis=1)]
array([[ 1.,  2.,  3.],
       [ 7.,  8.,  9.]])

and reassign this to a.

Explanation: np.isnan(a) returns a similar array with True where NaN, False elsewhere. .any(axis=1) reduces an m*n array to n with an logical or operation on the whole rows, ~ inverts True/False and a[ ] chooses just the rows from the original array, which have True within the brackets.

eumiro
  • 194,053
  • 32
  • 286
  • 259
  • 12
    `np.isfinite` is also useful in this case, as well as when you want to get rid of `±Inf` values. It doesn't require the `~`, since it returns true only for finite reals. – naught101 Sep 07 '16 at 23:16
  • 9
    @naught101 You also need to change `any` to `all`. Since you want to select rows where "all are finite", instead of selecting rows where "not any are nan". – AnnanFay Jun 06 '17 at 04:59