4

I need to get the Difference of an input layer with ~30k polygons and an overlay layer with ~500k polygons (of which ~200k intersect the input layer).

I am trying to learn PostGIS to do this, as it will (hopefully) be much faster than e.g. QGIS. Newbie at PostGIS so probably making stupid mistakes (sorry).

I have found it easy (and fast) to use ST_Difference across the datasets as set out here. E.g.

SELECT COALESCE(ST_Difference(input.geom,overlay.geom),input.geom)
FROM input LEFT JOIN overlay ON ST_Intersects(input.geom,overlay.geom)

However, this returns every pairwise Difference. So rather than returning only those elements of the input layer which are not intersected by any element of the overlay layer, it instead gives multiple copies of the input polygons, stacked on top of each other, each with a different bite out of them corresponding to a single intersecting polygon from the overlay layer.

EDIT: Don't bother to read the rest of my original question, which is all about the bad way I tried to solve this. Skip to the excellent answer from @JGH and @geozelot.

I have tried to solve this using a double-loop approach building on the the second answer here. Essentially:

For each input polygon (first loop):
   Find the subset of the overlay polygons which intersect that input polygon (using ST_Intersects)
      (Second, nested loop) Run ST_Difference recursively with each intersecting overlay polygon
   Replace the input polygon with the result of the recursive ST_Difference 

My SQL code attempting to implement this is at the bottom of this post and I know it's broken (I am a complete SQL newbie and have made a stupid mistake I don't understand - EXPLAIN output also given below).

Is this a viable approach (if I can fix the code), or is there a simpler/better way to do this? (It seems like a common enough GIS requirement...?)

My current (broken) code:

create function ST_Layer_Difference(input_geom geometry,overlay_geom geometry) returns geometry as $$
declare
   i integer;
   j integer;
   input_array geometry[];
   output_array geometry[];
   local_overlays_array geometry[];
   temp_geom geometry;
   output_geom geometry;
begin
    input_array := array(select input_geom);
    FOR i IN 1..array_length(input_array,1) LOOP
      temp_geom := input_array[i];
      local_overlays_array := array(select ST_Intersects(input_array[i],overlay_geom));
          FOR j IN 1..array_length(local_overlays_array,1) LOOP
            temp_geom := ST_Difference(temp_geom,local_overlays_array[j]);
          END LOOP;
      output_array[i] := temp_geom;
    END LOOP;
    output_geom := unnest(output_array);
    return output_geom;
end;
$$
LANGUAGE plpgsql;

create table public."output" as select ST_Layer_Difference(input.geom,overlay.geom) from input, overlay;

And the output of explain (showing I've gone wrong on line 1!):

ERROR:  syntax error at or near "function"
LINE 1: ...FERS false, SUMMARY false, SETTINGS false) create function S...
                                                             ^
SQL state: 42601
Character: 132
MDB
  • 155
  • 10
  • 1
    Check this similar post (use the 2nd layer in the lateral join, apply the bounding box comparison stated at the bottom of the answer and remove the "priority" comparison) – JGH Mar 17 '21 at 17:36
  • 1
    What statement is generating the error you are seeing? You don't explain a create statement, you would just run it, and explain a query calling the said function. Also it looks like the previous statement (with ...FERS false, SUMMARY false, SETTINGS false)) was not properly terminated – JGH Mar 17 '21 at 17:37
  • @JGH I will look into lateral join, thanks, it looks powerful. This will reveal my ignorance I imagine, but I am simply typing the above code into the sql query window in PgAdmin. – MDB Mar 17 '21 at 19:01
  • 1
    Try to execute (F5) the code that creates the function (the little play button to the left of the explain button, which generates this error) – JGH Mar 17 '21 at 19:10
  • @JGH Thanks for being so helpful. Unfortunately even with the useful link you provided, and some reading around, I'm having trouble getting my head round how to use lateral join here. If you are able to provide any more guidance, e.g. a brief example, I would be very grateful. (I've looked at the example you linked and I can't quite see how it's working or how to use it for my problem.) Sorry and thanks. – MDB Mar 17 '21 at 20:07

1 Answers1

7

Expanding on this answer, we can build a query that will process each geometry of table 1, then, for each of them (via the lateral join), union all intersecting geometries from the 2nd layer and finally keep either the entire original shape or compute the difference

SELECT ST_Multi(COALESCE(
         ST_Difference(a.geom, blade.geom),
         a.geom
       )) AS geom
FROM   table1 AS a
CROSS JOIN LATERAL (
  SELECT ST_Union(b.geom) AS geom
  FROM   table2 AS b
  WHERE  ST_Intersects(a.geom, b.geom) 
) AS blade
;
JGH
  • 41,794
  • 3
  • 43
  • 89
  • @geozelot I feel bad, this answer is almost a copy/paste of your previous post... If you want to write an answer with these small changes I will promptly remove this one – JGH Mar 17 '21 at 21:00
  • 1
    Would it be more or less performant to use ST_Intersects(a.geom, b.geom) instead of a.geom && b.geom? – dr_jts Mar 17 '21 at 21:42
  • @JGH Thank you so much for taking the time to provide that, I'm really grateful. It's running now, looking forward to the results. Not that it matters, but the only thing I don't get now from that code is what role the two "AS geom" statements are playing. – MDB Mar 17 '21 at 21:48
  • @JGH I would also be interested in your views on dr_jts's question - I'm guessing it might depend on the dataset e.g. how much unnecessary intersect testing there would be vs. how much unnecessary union-ing? – MDB Mar 17 '21 at 21:48
  • @dr_jts I set both versions going as a test. ST_Intersects has won - complete in 21 min 48 secs. && has been going for about twice that long and is still not complete. But as per above comment I guess this could be dataset-dependent? – MDB Mar 17 '21 at 22:45
  • @dr_jts In case of interest, the && version took 1 hr 15 mins. – MDB Mar 17 '21 at 23:25
  • 1
    @MDB thanks for trying this and reporting back. I would expect using ST_Intersects to be more performant in most situations. – dr_jts Mar 17 '21 at 23:53
  • @dr_jts Thanks - is that because ST_Union is much more expensive than ST_Intersects, so you want to minimise use of ST_Union? – MDB Mar 18 '21 at 06:41
  • 2
    Yes, that's why. ST_Intersects is a simpler computation than ST_Union (and in many cases can be short-circuited and be very fast). – dr_jts Mar 18 '21 at 16:20