7

I want to create a single bounding shapefile polygon of valid pixels in raster data (GeoTIFF) using python.

I need something efficient as I need to do this for hundreds of rasters. I had thought of writing out a mask using simple numpy manipulation, then passing that to gdal_polygonize.py, but that will not be quick enough. Does anyone know of a more direct way?

jonny2vests
  • 131
  • 1
  • 4
  • Are you looking for a shapefile of the bounding box of the image or a subpart of the image or is mask and the resulting polygon irregularly formed? – Kersten Oct 10 '15 at 10:40
  • 1
    You can use "plugin in QGIS called Image Boundary" (as suggested in http://gis.stackexchange.com/questions/26893/how-do-i-create-a-shapefile-showing-footprints-of-rasters/26944#26944) and then union the result to have a total boundary polygon. – fatih_dur Oct 10 '15 at 14:30
  • @Kersten, no, not the bounding box, the valid pixels. The image has nodata values which I want to ignore. – jonny2vests Oct 11 '15 at 20:20
  • @faith_dur: I want to do it in python. – jonny2vests Oct 11 '15 at 20:20
  • gdal_polyginize.py is similar, you could look there. Docs: http://gdal.org/gdal_polygonize.html. That and clamping your valid values to 1 would give you a 'mask' –  Oct 13 '15 at 14:57

2 Answers2

4

If your rasters are likely to have irregular or rotated boundaries, you can load the rasters into postgres using raster2pgsql, then running the sql it generates, something like this

raster2pgsql -I -C -s <SRID> <PATH/TO/RASTER FILE> <SCHEMA>.<DBTABLE> | psql -d <DATABASE>

To do this, you'll need to make sure you create your database first, and enable to postgis extension.

e.g.

raster2pgsql -I -C -s 27700 "/path/to/xyz.tif" public.test | psql -d rastertest

Here, the database is 'rastertest' and the raster is written into its own table, 'test'.

You can then use ST_ConvexHull to generate a polygon geometry of the outline. ST_MinConvexHull does the same, but excludes NODATA pixels.

select ST_AsEWKT(ST_ConvexHull(rast)) from test;

I'm not sure how the speed would compare to your proposed numpy mask solution, but it might be worth trying. I found on my laptop it generated the convex hull in 142ms for a 2000x2000, 16Mb tiff.

As you'll be doing this on lots of images, you'll probably want to script it in python (the psycopg2 library is the best way to access postgres from python)

Steven Kay
  • 20,405
  • 5
  • 31
  • 82
3

rasterio, shapely, and geopandas together will should do the trick:

import numpy as np
import rasterio.features as features
import pandas as pd
from geopandas import GeoDataFrame
from shapely.geometry import shape
from rasterio.transform import from_origin

dx = 1
XY = [504675.55,7695881.9]
nx = 1000 
west = XY[0]-(nx*dx)/2
north = XY[1]+(nx*dx)/2
Transform = from_origin(west,north,dx,dx)

Array = np.zeros(shape = (10,10))
Array[2:4,2:4] = 1
Array[6:9,6:7] = 2

d = {}
d['val']=list()
geometry = list()

for shp, val in features.shapes(Array.astype('int16'), transform=Transform):
    d['val'].append(val)
    geometry.append(shape(shp))

    print('%s: %s' % (val, shape(shp)))
df = pd.DataFrame(data=d)
geo_df = GeoDataFrame(df,crs={'init': 'EPSG:32608'},geometry = geometry)
geo_df['area'] =  geo_df.area 
geo_df.to_file('JustSomeRectanglesInTheNWT.shp', driver = 'ESRI Shapefile')

Its a similar process if you first read a .tiff using rasterio

June Skeeter
  • 183
  • 6