Grouping polygons by spatial attributes

Question

I'm working with a dataset that contains millions of polygons representing leased properties. In many cases, there are multiple polygons that are drawn on top of one another (representing multiple leases for the same property).

Currently I'm working on a process that will group these so that only one polygon geometry is used to represent multiple lease counterparts (same property leased multiple times). I have come up with a python script that uses an arcpy Select by Location, and compares the XY centroid coordinates, perimeter, area, and extent of overlapping polygons to apply a unique ID to these groups (basically). This works just fine, but it takes forever.

I'm wondering: would it be possible to use SQL (Using SQL Server Management Studio) to identify overlapping polygons (based on buffer of centroid) and group by other spatial attributes of the input polygon?

I have made an attempt at getting this to work, but got too many groups. I have around 41,000 polygons (for the first county) and should have fewer than 14,000 'unique' polygons or groups. I have access to FME and ArcGIS Advanced Licenses, so can anyone provide an alternative to using an arcpy.da Search cursor to select by location and compare selected polygon attributes in python?

Here is my first attempt at the SQL Query. The projected coordinate system uses meters, thus the +- 11 in the query for the extents.

    SELECT a.Poly_ID
    FROM
    (
    SELECT [OBJECTID]
          ,[SHAPE]
  ,[calc_acres]
  ,[perimeter]
  ,[CentX]
  ,[CentY]
  ,[X_Max]
  ,[Y_Max]
  ,[X_Min]
  ,[Y_Min]
  ,[Poly_ID]
  ,[leaseId]
  ,geometry::STGeomFromText('POINT (' + convert(varchar,[CentX]) + ' '+ convert(varchar,[CentY]) + ')', 5070) as cent_geom
   FROM [poly_topology].[gis_mfg].[LTPOLYGONS]) a 
  left join (
SELECT [OBJECTID]
  ,[SHAPE]
  ,[calc_acres]
  ,[perimeter]
  ,[CentX]
  ,[CentY]
  ,[X_Max]
  ,[Y_Max]
  ,[X_Min]
  ,[Y_Min]
  ,[Poly_ID]
  ,[leaseId]
  ,geometry::STGeomFromText('POINT (' + convert(varchar,[CentX]) + ' '+ convert(varchar,[CentY]) + ')', 5070) as cent_geom
 FROM [poly_topology].[gis_mfg].[LTPOLYGONS]) b on 
  a.cent_geom.STBuffer(6).STContains(b.cent_geom) = 1 and 
  b.perimeter > (a.perimeter - (a.perimeter * 0.03)) and b.perimeter < (a.perimeter + (a.perimeter * 0.03)) and
  b.calc_acres > (a.calc_acres - (a.calc_acres * 0.03)) and b.calc_acres < (a.calc_acres + ( a.calc_acres * 0.03)) and
  b.X_Max > (a.X_Max - 11) and b.X_Max < (a.X_Max + 11) and 
  b.Y_Max > (a.Y_Max - 11) and b.Y_Max < (a.Y_Max + 11) and 
  b.X_Min > (a.X_Min - 11) and b.X_Min < (a.X_Min + 11) and 
  b.Y_Min > (a.Y_Min - 11) and b.Y_Min < (a.Y_Min + 11) and b.leaseId != a.leaseId
  group by a.Poly_ID
  order by a.Poly_ID`

UPDATE: After implementing FelixIP's idea into my process, I found the following link: http://clear.uconn.edu/tools/Shape_Metrics/method.htm

This Shape Metrics tool is old, but the methodologies seem more robust than relying on perimeter, area, extent, and centroid alone. These values are susceptible to outliers, so not every polygon in a group is caught (mainly resulting from extraneous boundaries where two polygons were dissolved but the boundaries were not completely overlapping).

Calculate string field and concatenate in it string representations of you parameters (truncated to a couple of decimals), e.g. centreX, centreY, area). Dissolve polygons using this field (single part) — FelixIP, Feb 10 '16 at 00:16
It looks to me like the gist of what you are trying to do is locate polygons on top of each other, with a little bit of tolerance because they don't perfectly line up. Have you tried the Polygon Neighbors tool? You could assume polygons with 99% coverage (or whatever you think is appropriate) are matches. Here is the link. http://pro.arcgis.com/en/pro-app/tool-reference/analysis/polygon-neighbors.htm. — RHB, Feb 10 '16 at 13:57
@RHB, I've never seen that one before but I'm trying it out now. I'll let you know how it turns out — crld, Feb 10 '16 at 18:03
@FelixIP, That is a great idea, but it would require some reclassification of the fields, and I'm not convinced that classifying each field would result in the maximum number of matches. I will certainly give it a shot. — crld, Feb 10 '16 at 18:09
@crld no reclass required. Can be done using single line of code in field calculator using python. E.g. '%s %s %s' %( int(!Shape!.centroid.X100),int(!Shape!.centroid.Y100), int(!Shape!.area*100)) — FelixIP, Feb 10 '16 at 19:11
Ok; I've tried both methods. @FelixIP the string join method groups them down quite a bit (11,253 out of 42, 038), but there are still a lot of instances where a small acreage/perimeter/extent/centroid difference causes counterpart polygons to not be grouped together. I could use this as the input for my other script, and could likely make it run much faster — crld, Feb 10 '16 at 19:39
@RHB, I tried the Polygon neighbors tool; it produced a table with 248,762 rows. I selected/exported rows where the src and nbr values match, and got 5586 rows. Now I'm wondering how to join these back to the input polygons to get the ObjectID (or counterpart ID) associated to each polygon. I could export this as a table on an SDE and do a sql join to the polygons (on all the fields). I also tried FelixIPs idea to create the string and join polygons to the table based on that, but couldn't get that to work out for some reason. — crld, Feb 10 '16 at 19:44
FME has the AreaOnAreaOverlayer transformer which could possibly be used. There's also the NearestNeighbour transformer, as well as the SpatialRelator/Filters too. You should be able to use some combination of them to associate the polygons to each other. — GIS-Jonathan, Feb 11 '16 at 11:10
@FelixIP Wins; The dissolve on the concatenated string produced 11,253 unique rows out of 42,038. I wrote a python script that compared the keys of a dictionary (the concatenated string) by splitting the string into a numpy array (vector) containing floating point values so that the comparison is done all at once. This comparison took 4 hours instead of 20 but only found 3 fewer groups than the dissolve. As such, the string comparison strategy is the fastest . FelixIP if you leave your comment as an answer I'll mark it as such. — crld, Feb 11 '16 at 17:52
To be more clear, the python dictionary contains keys (concatenated strings) with values (list of polygon IDs with same string); my comparison was matching values from the key to other keys and extending the lists of polygon IDs where it found a match. Apparently the one example that I found where the string matching didn't catch all polygons in a group was one of the few that got caught by this comparison — crld, Feb 11 '16 at 17:55

score 1 · Accepted Answer · edited Apr 13 '17 at 12:34

I found that replacing geometries by their string "signatures" is very efficient technique when comparing geometries. E.g. Assign point IDs to respective start and end attributes of a polyline or finding points that overlap. This approach combined with dictionaries is a game changer.

By some reason, that I don't fully understand, truncate produce more robust results than rounding or converting decimal to integers described in my comments. Thus this:

def truncate(f, n):
 s = '{}'.format(f)
 i, p, d = s.partition('.')
 return '.'.join([i, (d+'0'*n)[:n]])

'============================================

truncate( !Shape!.centroid.X,2)+truncate( !Shape!.centroid.Y,2)+truncate( !Shape!.area,2)

will potentially work better, when populating field for dissolve. In fact it is enough to summarise it and find minimum of sequential integer value stored in other field (FID and OBJECTID won't work when using standard Summarise tool, but are perfectly fine in script!). This way first unique features can be found and exported for further analysis

This is really nice. You can also use this calculated field with the field calculator to set up unique identifiers for each group so you can set up a one to many relate. — RHB, Feb 12 '16 at 09:59

Grouping polygons by spatial attributes

1 Answers1

Linked