11

I have a netcdf file showing elevation over Europe derived from GTopo30. Turns out that the extent of this raster file is too big to combine it with other layers, so I would like to crop it to a given extent specified in lat/lon coordinates. I have seen some examples here and here trying to make this operation I am describing, but it seems they are not working in this case.

The raster file description looks like this:

<xarray.DataArray (band: 14, y: 5520, x: 8400)>
[649152000 values with dtype=float64]
Coordinates:
  * band     (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  * y        (y) float64 71.0 70.99 70.98 70.97 70.96 ... 25.03 25.02 25.01 25.0
  * x        (x) float64 -25.0 -24.99 -24.98 -24.97 ... 44.97 44.98 44.99 45.0
Attributes:
    transform:     (0.0083333333, 0.0, -25.000139509, 0.0, -0.0083333333, 70....
    crs:           +init=epsg:4326
    res:           (0.0083333333, 0.0083333333)
    is_tiled:      0
    nodatavals:    (-1.7e+308, -1.7e+308, -1.7e+308, -1.7e+308, -1.7e+308, -1...
    scales:        (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1....
    offsets:       (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
    descriptions:  ('tpi', 'slope', 'aspect', 'Pvec', 'Qvec', 'alt', 'dist2co...

So, as you can see, the longitude and longitudes are stored under the "x" and "y" coordinates and they come in WGS84 (EPSG:4326) format.

The extent is this:

min_lon = -24.995
min_lat = 25.05
max_lon = 45.50
max_lat = 71.55

I tried to use xarray's filters and subsetting, but it seems I am doing something wrong, since I can't get the required slice:

# Example 1
sel1 = band.sel(x=(band.x < max_lon) | (band.x > min_lon))
sel2 = sel1.sel(y=(sel1.y < max_lat) | (sel1.y > min_lat))

# Example 2
sel1 = band.where((band.x < max_lon) | (band.x > min_lon))
sel2 = sel1.where((sel1.x < max_lat) | (sel1.x > min_lat))

Both of them produce: KeyError: "not all values found in index 'x'"

What is the correct xarray way of cropping a large raster to a given lat/lon window? Thanks!

Irene
  • 1,220
  • 1
  • 14
  • 25

3 Answers3

11
min_lon = -24.995 
min_lat = 25.05 
max_lon = 45.50 
max_lat = 71.55 


cropped_ds = ds.sel(lat=slice(min_lat,max_lat), lon=slice(min_lon,max_lon))
Kadir Şahbaz
  • 76,800
  • 56
  • 247
  • 389
Paula
  • 126
  • 1
  • 2
  • This works for the given case, but for anyone working with triagular mesh it doesn't work because x and y are only cooridnates and not dimesnions – ciskoh Nov 28 '22 at 11:46
10

You could use rioxarray: https://corteva.github.io/rioxarray/stable/examples/clip_box.html

import rioxarray

min_lon = -24.995
min_lat = 25.05
max_lon = 45.50
max_lat = 71.55

subset = band.rio.clip_box(minx=min_lon, miny=min_lat, maxx=max_lon, maxy=max_lat)
snowman2
  • 7,321
  • 12
  • 29
  • 54
6

One way is to create a boolean mask for the dataset coordinates using the extent you specified and then using the .where() method on the dataset.

Here is one example using a tutorial dataset that comes with xarray.

First, load the dataset (passing the decode_times=False argument because, at least in my case, it raises an error otherwise) and inspect it.

import xarray as xr

ds = xr.tutorial.open_dataset('rasm', decode_times=False).load()

>>> ds <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates:

  • time (time) float64 7.226e+05 7.226e+05 ... 7.236e+05 7.237e+05 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51

Dimensions without coordinates: x, y Data variables: Tair (time, y, x) float64 nan nan nan nan nan ... 29.8 28.66 28.19 28.21 Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/l... institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 19... comment: Output from the Variable Infiltration Capacity... nco_openmp_thread_number: 1 NCO: "4.6.0" history: Tue Dec 27 14:15:22 2016: ncatted -a dimension...

You can see that it has 275 columns on x and 205 rows on y as well as xc and yc coordinates (which have a different name on your dataset).

Now, you can create the mask for indexing purposes using the extent. It should look something like this:

min_lon = -24.995
min_lat = 25.05
max_lon = 45.50
max_lat = 71.55

mask_lon = (ds.xc >= min_lon) & (ds.xc <= max_lon) mask_lat = (ds.yc >= min_lat) & (ds.yc <= max_lat)

Finally, it is just a matter of using the where() method and specifying drop=True as an argument.

cropped_ds = ds.where(mask_lon & mask_lat, drop=True)

If you inspect it, you will see that it has different dimensions (116 columns for x and 101 rows for y).

>>> cropped_ds
<xarray.Dataset>
Dimensions:  (time: 36, x: 116, y: 101)
Coordinates:
  * time     (time) float64 7.226e+05 7.226e+05 ... 7.236e+05 7.237e+05
    xc       (y, x) float64 3.504 350.8 346.0 343.5 ... 17.65 17.4 17.15 16.91
    yc       (y, x) float64 89.48 89.05 88.61 88.16 ... 28.26 28.01 27.76 27.51
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan ... 29.8 28.66 28.19 28.21
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...
Marcelo Villa
  • 5,928
  • 2
  • 19
  • 38
  • This loads the entire dataset into memory and causes a memory allocation error for me. In the past I was able to use .sel with conditions on different coordinates to load in a piece of the dataset – pbreach May 11 '21 at 19:21
  • This did not work correctly for me when using CMIP netCDF data. It adds the lat and lon dimensions to time_bnds, lat_bnds and lon_bnds, potentially giving a larger file than when we started. But replacing your ds.where command with cropped_ds = ds.sel(lat=slice(min_lat, max_lat), lon=slice(min_lon, max_lon)) seems to work. – Peter B Apr 04 '22 at 06:04