What is the fastest way to select oldest, unique geometry from a table containing duplicates in PostGIS?

Question

I have an output from an ArcPy arcpy.Union_analysis operation and for the overlap polygons that were created I need to select out the oldest feature (along with the attributes). The result is 75740 rows.

Normally, I would just use FindIdentical_management, but I found the results unreliable for 10.2.2 (and I can't change the version I'm on). After rolling my own in arcpy.da and not getting the performance I wanted, I am trying to select out via a PostGIS query, using this answer to Delete duplicate geometry in postgis tables as template.

While this is a big improvement, I am not as experienced with PostGIS + window functions and I think there might be a faster approach?

So far, I have:

-- dtime , numeric lower values are older
WITH unique_geoms (
    rowid
    ,geom
    )
AS (
    SELECT row_number() OVER (
            PARTITION BY geom ORDER BY dtime ASC
            ) AS rowid
        ,geom
        ,oid
        ,dtime
        ,dval
    FROM x_dupes
    )
SELECT rowid
    ,geom
    ,oid
    ,dtime
    ,dval
FROM unique_geoms
WHERE rowid=1;

Explain: http://explain.depesz.com/s/9cV

"CTE Scan on unique_geoms  (cost=10312.46..12016.61 rows=379 width=104) (actual time=701.221..2049.404 rows=40389 loops=1)"
"  Filter: (rowid = 1)"
"  Rows Removed by Filter: 35351"
"  CTE unique_geoms"
"    ->  WindowAgg  (cost=8797.66..10312.46 rows=75740 width=154) (actual time=701.200..1577.202 rows=75740 loops=1)"
"          ->  Sort  (cost=8797.66..8987.01 rows=75740 width=154) (actual time=701.171..970.353 rows=75740 loops=1)"
"                Sort Key: x_dupes.geom, x_dupes.dtime"
"                Sort Method: quicksort  Memory: 24671kB"
"                ->  Seq Scan on x_dupes  (cost=0.00..2659.40 rows=75740 width=154) (actual time=0.011..272.713 rows=75740 loops=1)"

http://explain.depesz.com/s/9cV

PostGIS Version: USE_GEOS=1 USE_PROJ=1 USE_STATS=1

Do you have an index on the x_dupes geometry column? Might not make a difference, but worth asking. — John Powell, Jul 03 '15 at 07:17
Yes and it did not make a difference. But I did load the data into an unlogged table--and for the same query that was much faster (≈ 40% faster). — Jay Cummins, Jul 03 '15 at 13:09

What is the fastest way to select oldest, unique geometry from a table containing duplicates in PostGIS?

0 Answers0