49

I have a large number of polygons (~100000) and try to find a smart way of calculating their intersecting area with a regular grid cells.

Currently, I am creating the polygons and the grid cells using shapely (based on their corner coordinates). Then, using a simple for-loop I go through each polygon and compare it to nearby grid cells.

Just a small example to illustrate the polygons/grid cells.

from shapely.geometry import box, Polygon
# Example polygon 
xy = [[130.21001, 27.200001], [129.52, 27.34], [129.45, 27.1], [130.13, 26.950001]]
polygon_shape = Polygon(xy)
# Example grid cell
gridcell_shape = box(129.5, -27.0, 129.75, 27.25)
# The intersection
polygon_shape.intersection(gridcell_shape).area

(BTW: the grid cells have the dimensions 0.25x0.25 and the polygons 1x1 at max)

Actually this is quite fast for an individual polygon/grid cell combo with around 0.003 seconds. However, running this code on a huge amount of polygons (each one could intersect dozens of grid cells) takes around 15+ minutes (up to 30+ min depending on the number of intersecting grid cells) on my machine which is not acceptable. Unfortunately, I have no idea how it is possible to write a code for polygon intersection to get the area of overlap. Do you have any tips? Is there an alternative to shapely?

HyperCube
  • 3,658
  • 9
  • 37
  • 53
  • 1
    I'm curious how you are looping and intersecting your polygons. Can you show more code on the process? It would be easier to figure out how this can be optimized. – tdedecko Feb 05 '13 at 00:30
  • I basically take an array of lat/lon corner values and convert them in a for loop to the polygons. Then, I compare each polygon to certain grid cell, which is done in a for-loop again. See this: http://stackoverflow.com/a/13956110/1740928 – HyperCube Feb 05 '13 at 08:35

2 Answers2

75

Consider using Rtree to help identify which grid cells that a polygon may intersect. This way, you can remove the for loop used with the array of lat/lons, which is probably the slow part.

Structure your code something like this:

from shapely.ops import cascaded_union
from rtree import index
idx = index.Index()

# Populate R-tree index with bounds of grid cells
for pos, cell in enumerate(grid_cells):
    # assuming cell is a shapely object
    idx.insert(pos, cell.bounds)

# Loop through each Shapely polygon
for poly in polygons:
    # Merge cells that have overlapping bounding boxes
    merged_cells = cascaded_union([grid_cells[pos] for pos in idx.intersection(poly.bounds)])
    # Now do actual intersection
    print(poly.intersection(merged_cells).area)
Mike T
  • 38,021
  • 17
  • 143
  • 186
  • 7
    This remains an incredibly helpful answer - it should have been accepted. I had a similar problem and `Rtree` made the algorithm run around 5000 times faster. – Gabriel Jan 09 '16 at 11:13
  • 2
    Note that `Rtree` can be only used for boxes (4 points), not complex polygons. – Ikar Pohorský Feb 15 '17 at 10:58
  • For "real" polygons just add a proper `actual geometry intersects?` check for each bounds intersection. The rtree let's you reduce the search space and things are super fast. – bugmenot123 May 16 '18 at 11:09
  • While an old answer, this is the result that i found to remind me what object.. rtree.. If you have a complex poly, just envelope it and then check against that first before the complex one.. Will help speed things up again.+1 for simple example – Angry 84 Aug 30 '18 at 06:33
  • 1
    BTW, shapely has an R-tree implementation, as noted in the answer below: https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree – chris Jun 11 '19 at 19:51
28

Since 2013/2014 Shapely has STRtree. I have used it and it seems to work well.

Here is a snippet from the docstring:

STRtree is an R-tree that is created using the Sort-Tile-Recursive algorithm. STRtree takes a sequence of geometry objects as initialization parameter. After initialization the query method can be used to make a spatial query over those objects.

>>> from shapely.geometry import Polygon
>>> from shapely.strtree import STRtree
>>> polys = [Polygon(((0, 0), (1, 0), (1, 1))), Polygon(((0, 1), (0, 0), (1, 0))), Polygon(((100, 100), (101, 100), (101, 101)))]
>>> s = STRtree(polys)
>>> query_geom = Polygon([(-1, -1), (2, 0), (2, 2), (-1, 2)])
>>> result = s.query(query_geom)
>>> polys[0] in result
True
Georgy
  • 9,972
  • 7
  • 57
  • 66
Phil
  • 5,444
  • 2
  • 30
  • 55
  • 1
    This is so helpfull. Do you know if the STRtree can be serialized with pickle or marshall libraries to save it for later use? – eguaio Aug 30 '17 at 22:25
  • 1
    No, I am not familiar with the serialization capabilities of the STRtree. I believe it is completely dependent on the serialization of the _tree_handle returned by ``shapely.geos.GEOSSTRtree_create(max(2, len(geoms)))`` – Phil Sep 15 '17 at 23:38