I have an output from an ArcPy arcpy.Union_analysis operation and for the overlap polygons that were created I need to select out the oldest feature (along with the attributes). The result is 75740 rows.
Normally, I would just use FindIdentical_management, but I found the results unreliable for 10.2.2 (and I can't change the version I'm on). After rolling my own in arcpy.da and not getting the performance I wanted, I am trying to select out via a PostGIS query, using this answer to Delete duplicate geometry in postgis tables as template.
While this is a big improvement, I am not as experienced with PostGIS + window functions and I think there might be a faster approach?
So far, I have:
-- dtime , numeric lower values are older
WITH unique_geoms (
rowid
,geom
)
AS (
SELECT row_number() OVER (
PARTITION BY geom ORDER BY dtime ASC
) AS rowid
,geom
,oid
,dtime
,dval
FROM x_dupes
)
SELECT rowid
,geom
,oid
,dtime
,dval
FROM unique_geoms
WHERE rowid=1;
Explain: http://explain.depesz.com/s/9cV
"CTE Scan on unique_geoms (cost=10312.46..12016.61 rows=379 width=104) (actual time=701.221..2049.404 rows=40389 loops=1)"
" Filter: (rowid = 1)"
" Rows Removed by Filter: 35351"
" CTE unique_geoms"
" -> WindowAgg (cost=8797.66..10312.46 rows=75740 width=154) (actual time=701.200..1577.202 rows=75740 loops=1)"
" -> Sort (cost=8797.66..8987.01 rows=75740 width=154) (actual time=701.171..970.353 rows=75740 loops=1)"
" Sort Key: x_dupes.geom, x_dupes.dtime"
" Sort Method: quicksort Memory: 24671kB"
" -> Seq Scan on x_dupes (cost=0.00..2659.40 rows=75740 width=154) (actual time=0.011..272.713 rows=75740 loops=1)"
http://explain.depesz.com/s/9cV
PostGIS Version: USE_GEOS=1 USE_PROJ=1 USE_STATS=1