5

I am trying to interpolate a 2D array that contents masked data. I have used some of the SciPy module's methods available, including interp2d, bisplrep/bisplev, as well as RectBivariateSpline. As an additional information, my data is a regular array, which means that grids have the same dimension (in this case 1ºX1º).

Having said that, is there any way to interpolate avoiding masked data in an array with Python? I am still new using Python and NumPy/SciPy modules.

hurrdrought
  • 446
  • 4
  • 14
  • 1
    Actually I don't think there are a lot of interpolations out there that ignore masked data. At least not in the standard libraries like numpy, scipy. I always ended up writing my own functions that accept a mask. For 1D data at least it is possible if you mask ``x`` and ``y`` coordinate with your mask (``x=x[~y.mask]`` and ``y=y[~y.mask]``) but since this flattens the arrays that's not a possibility with n-dimensional data. – MSeifert Mar 04 '16 at 22:46
  • 1
    What about making a copy of the array only containing the unmasked data? (eg. `ma.filled`). Since now your array is not anymore a complete grid, you can't use `RectBivariateSpline` but `interp2d` should still work. You just have to specify each point as x,y,z so you'd have to generate those input data w/o help from numpy. – roadrunner66 Mar 04 '16 at 22:55
  • 1
    Many `ma` aware function use `filled` to create a temporary copy. For example `ma.sum` uses `filled(0)` to replace the masked values with innocuous 0s. What's an innocuous value in your interpolation case? – hpaulj Mar 04 '16 at 23:03
  • @roadrunner66. If I am not wrong [please correct me if I am], your proposed method should works in a similar fashion of the one proposed by MSeifert. I mean, basically eliminating the masked data to get just the data I need. – hurrdrought Mar 05 '16 at 03:56
  • @hpaulj. In my case, the problem is not only related to an innocuous value, but [not clear to me yet] also because after the interpolation, isolated unmasked grids [the ones with the actual data] disappear. Those spaces are then filled by the 'masked' data, which is a number like `-9.96921e+36`. This issue could be related to the interpolation method I am using [i.e. `interp2d` or `RectBivariateSpline`] but it remains unclear to me. – hurrdrought Mar 05 '16 at 04:05
  • Yes, didn't see that yet. I'd go that way. – roadrunner66 Mar 05 '16 at 04:07
  • You may want to take a look at astropy 's convolve function which handles NaN / masked values by "ignoring them during convolution and replacing NaN pixels with interpolated values" – ssnobody Aug 01 '17 at 00:09

3 Answers3

4

You can actually use every function that accepts x, y, z (which is the case for interp2d and probably the others as well) with your masked data. But you need to explicitly create a mgrid:

z = ... # Your data
x, y = np.mgrid[0:z.shape[0], 0:z.shape[1]]

Then you need to delete all masked values in all of these coordinates:

x = x[~z.mask]
y = y[~z.mask]
z = z[~z.mask]

With these final x, y, z you can call every of your specified functions (that accepts incomplete grids, so RectBivariateSpline won't work). Notice however that some of these use interpolation boxes so if there is a too big region where you discarded the data because of your mask the interpolation will fail there (resulting in np.nan or 0). But you might tweak the parameters to compensate for that, if it happens.

For example:

data = np.random.randint(0, 10, (5,5))
mask = np.random.uniform(0,1,(5,5)) > 0.5
z = np.ma.array(data, mask=mask)
x, y = np.mgrid[0:z.shape[0], 0:z.shape[1]]
x1 = x[~z.mask]
y1 = y[~z.mask]
z1 = z[~z.mask]
interp2d(x1, y1, z1)(np.arange(z.shape[0]), np.arange(z.shape[1]))

array([[  1.1356716 ,   2.45313727,   3.77060294,   6.09790177, 9.31328935],
       [  3.91917937,   4.        ,   4.08082063,   3.98508121, 3.73406764],
       [ 42.1933738 ,  25.0966869 ,   8.        ,   0.        , 0.        ],
       [  1.55118338,   3.        ,   4.44881662,   4.73544593, 4.        ],
       [  5.        ,   8.        ,  11.        ,   9.34152525, 3.58619652]])

you can see the small area of 0's because the mask had there many masked values:

mask
array([[False,  True,  True,  True, False],
       [False, False,  True, False, False],
       [ True,  True, False,  True,  True],
       [False,  True, False,  True,  True],
       [False,  True, False, False,  True]], dtype=bool)

data
array([[2, 4, 4, 5, 5],
       [1, 4, 1, 3, 8],
       [9, 1, 8, 0, 9],
       [7, 2, 0, 3, 4],
       [9, 6, 0, 4, 4]])
MSeifert
  • 133,177
  • 32
  • 312
  • 322
  • 2
    Your approach is interesting. I tried to run a function [as in your example] using my actual data. However, as you mentioned, large masked areas went converted into 0s. In my case a '0' matters, since I am working with climate data of temperature and precipitation. Thus, I cannot delete or mask those 0s after the interpolation. Thank you very much! – hurrdrought Mar 05 '16 at 03:35
  • Just looking at MSeifert's code I'm not sure why that would happen. If `zero` is a problem you could use `np.fill` to fill masked numbers with any arbitrary and recognizable number, like -9999 or so. – roadrunner66 Mar 05 '16 at 04:11
  • 1
    @roadrunner66 - Spline interpolation is very sensitive to outliers so filling the values with some clear outliers is not very effective. – MSeifert Mar 05 '16 at 10:54
  • @hurrdrought - The documentation clearly states how many points along each axis are needed: "The minimum number of data points required along the interpolation axis is (k+1)**2, with k=1 for linear, k=3 for cubic and k=5 for quintic interpolation." But for linear interpolation the number is just 4 points per axis so I didn't thought this would be a problem for your data. Did you try [``SmoothBivariateSpline``](http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.interpolate..html) – MSeifert Mar 05 '16 at 10:56
  • 1
    @MSeifert. I didn't mean to fill things and use them. I meant to use `fill` to generate a new , non masked array with a clear marker, that allows you to ignore those pixels when you generate the (x,y,z) data for `interp2d`. I agree that splines can be very noisy if your data are not smooth. No need to make that even worse with fake data. – roadrunner66 Mar 05 '16 at 19:10
3

The problem with the approaches outlined by @MSeifert is that the regular grid structure is lost, resulting in inefficient interpolation. It is justified only to fill in missing data by interpolation, but not for typical interpolation from one grid to another, where missing data should not be filled in.

In this case, filling missing values with np.nan is the simplest approach. These will be propagated in the calculations and the resulting array will have nans wherever a missing value was used for interpolation.

# fast interpolator that use the regular grid structure (x and y are 1D arrays)
z = z_masked.filled(np.nan)
zinterp = RegularGridInterpolator((x, y), z.T)

# new grid to interpolate on
X2, Y2 = np.meshgrid(x2, y2)
newpoints = np.array((X2, Y2)).T

# actual interpolation
z2 = zinterp(newpoints)
z2_masked = np.ma.array(z2, mask=np.isnan(z2))

For completeness, another approach is to interpolate a second mask array (filled with 1 where data is missing) to fill in missing values on the new grid.

# fast interpolator that use the regular grid structure (x and y are 1D arrays)
zinterp = RegularGridInterpolator((x, y), z.T)
minterp = RegularGridInterpolator((x, y), (mask+0.).T)

# actual interpolation
z2 = zinterp(newpoints)
mask2 = minterp(newpoints) > 0  # apply threshold, e.g. 0.5 is considered contaminated and will be removed.
z2[mask2] = np.nan  # fill with nans or whatever missing data flag

Note both approaches should also work with RectBivariateSpline, if spline interoplation is desired. And either way, this should be much faster than using interp2d...

Mahé
  • 318
  • 3
  • 9
0

i typically follow the approach described by @mseifert, but add the following refinement if i am weary of the interpolation error through the masked areas. that seems to be one of your concerns, @hurrdrought? the idea is to propagate the mask to the interpolated result. a simple example for 1D data is:

def ma_interp(newx,x,y,mask,propagate_mask=True):
    newy = np.interp(newx,x[~mask],y[~mask]) # interpolate data
    if propagate_mask: # interpolate mask & apply to interpolated data
        newmask = mask[:]
        newmask[mask] = 1; newmask[~mask] = 0
        newmask = np.interp(newx,x,newmask)
        newy = np.ma.masked_array(newy, newmask>0.5)
    return newy