2

I am looking for an efficient way to remove any polygons from a GeoDataFrame which do not intersect with any polygons of another GeoDataFrame. I tried it the following way which seems really inefficient.

import geopandas as gpd

gdf1 = gdp.read_file('./my_geodataframe1.gpkg') gdf2 = gdp.read_file('./my_geodataframe2.gpkg')

gdf2_dissolved = gdf2.dissolve() indices_to_remove = []

for index, row in gdf1.iterrows(): polygon = row['geometry'] if not gdf2_dissolved.intersects(polygon)[0]: indices_to_remove.append(index)

Is there any quicker way to do so?

gene
  • 54,868
  • 3
  • 110
  • 187
wup017
  • 23
  • 3
  • You can try this one liner, but I think internally it's just a for loop, so may not be any quicker.

    gdf1 = gdf1[gdf1.geometry.intersects(gdf2_dissolved)]

    – Shawn Nov 03 '23 at 19:54

2 Answers2

2

Spatial join is very fast. Try this:

import geopandas as gpd
import os
folder = r"/home/bera/Drives/data1/data/LMV/Topografi100/Topografi_100_Sverige_230721"

df1 = gpd.read_file(os.path.join(folder, "mark_sverige.gpkg"), layer="markframkomlighet") df2 = gpd.read_file(os.path.join(folder, "naturvard_sverige.gpkg"), layer="skyddadnatur")

print(df1.shape) #(3320, 5)

#Create an id column to filter by after the spatial join df1["tempid"] = range(df1.shape[0])

#Spatial join with predicate "inner" to only keep features intersecting matches = gpd.sjoin(left_df=df1, right_df=df2, how="inner").tempid #A Series of tempids in df1 intersecting df2

#Select, and drop tempid column df1 = df1.loc[df1.tempid.isin(matches)].drop(columns="tempid")

print(df1.shape) #(1623, 5)

BERA
  • 72,339
  • 13
  • 72
  • 161
2

There are many solutions, see Intersecting two shape problem using geopandas for example

Generally, iterrows should only be used in specific cases (How to iterate over rows in a DataFrame in Pandas)

In addition to Bera's solution, you can use the apply function and the simple unary_union property

enter image description here

gdf1 = gpd.read_file('stac1.shp')
gdf2 = gpd.read_file('stac2.shp')
uu = gdf1.unary_union
gdf2['inter'] = gdf2['geometry'].apply(lambda row: row.intersects(uu))
# polygons from gdf2 which do not intersect with any polygons of gdf1
print(gdf2.loc[gdf2.inter == 0]
   id  ori              geometry                                inter
2   3  gdf2  POLYGON ((-1.095 0.843, -0.376 0.760, -1.022 0...  False
3   2  gdf2  POLYGON ((-0.417 -0.616, 0.173 -0.912, -0.398 ...  False

Another solution is to use a simple for loop (without iterating over the GeoDataFrame):

 indices_to_remove = []
 for i,geom in enumerate(list(gdf2.geometry)):
      if not geom.intersects(uu):
           print(i)
           indices_to_remove.append(i)
 2
 3
 print(gdf2.iloc[indices_to_remove])
    id   ori           geometry
 2   3  gdf2  POLYGON ((-1.095 0.843, -0.376 0.760, -1.022 0...  
 3   2  gdf2  POLYGON ((-0.417 -0.616, 0.173 -0.912, -0.398 ... 
gene
  • 54,868
  • 3
  • 110
  • 187