1

I have this query, which is very slow (+50 seconds):

SELECT e.gid, e.geom, osm_id, name
    FROM roads_vlaanderen_edges_2021_2022 e 
    inner join(select st_buffer(geom,30) geom from projecten where gid=126) p
    on st_contains(p.geom, e.geom);

EXPLAIN ANALYZE gives this:

"Nested Loop  (cost=85.60..86580.42 rows=1620 width=126) (actual time=26.775..49821.592 rows=3176 loops=1)"
"  ->  Seq Scan on projecten  (cost=0.00..1.30 rows=1 width=58909) (actual time=0.007..0.013 rows=1 loops=1)"
"        Filter: (gid = 126)"
"        Rows Removed by Filter: 30"
"  ->  Bitmap Heap Scan on roads_vlaanderen_edges_2021_2022 e  (cost=85.60..86577.50 rows=162 width=126) (actual time=26.763..49814.758 rows=3176 loops=1)"
"        Filter: st_contains(st_buffer(projecten.geom, '30'::double precision, ''::text), geom)"
"        Rows Removed by Filter: 7436"
"        Heap Blocks: exact=3548"
"        ->  Bitmap Index Scan on roads_vlaanderen_geom_2021_2022  (cost=0.00..85.56 rows=1620 width=0) (actual time=1.248..1.249 rows=10612 loops=1)"
"              Index Cond: (geom @ st_buffer(projecten.geom, '30'::double precision, ''::text))"
"Planning Time: 0.413 ms"
"Execution Time: 49824.095 ms"

Without the buffer the query is very fast (<1 second)

SELECT e.gid, e.geom, osm_id, name
    FROM roads_vlaanderen_edges_2021_2022 e 
    inner join(select geom from projecten where gid=126) p
    on st_contains(p.geom, e.geom);

The subquery itself is also very fast (<1 second)

select st_buffer(geom,30) geom from projecten where gid=126

So why is my first query that slow? (My tables are spatially indexed, and in a coordinate system in meter.)

ChatGPT came up with this solution, and it worked (<1 second), but I have no idea why this is different.

create temporary table project_buffer as
select st_buffer(geom,30) geom from projecten where gid=126;

SELECT e.gid, e.geom, osm_id, name FROM roads_vlaanderen_edges_2021_2022 e inner join project_buffer p on st_contains(p.geom, e.geom);

DROP TABLE project_buffer;

I do regular queries like this, and never encountered this problem.

1 Answers1

1

You seem to have a geom index that is used, so you can start by doing an analyze on your table to be sure to have your table metadata updated, sometimes it can have a big impact on the planner.

Then, there is another solution to have (probably) the same result without creating the temporary table: materialize a CTE (it's like a temp table, but just inside your query):

WITH project_buffer as MATERIALIZED (
  select st_buffer(geom, 30) geom from projecten where gid = 126
)
SELECT
  e.gid,
  e.geom,
  osm_id,
  name
FROM
  roads_vlaanderen_edges_2021_2022 e
  inner join project_buffer p on st_contains(p.geom, e.geom)

But the main problem if you want a performant query is probably that you create a custom geom on the fly. The projecten table probably have a complex geom that is costly to create a buffer from, and the planner probably don't optimize well this step and create a new geom with the buffer a lot more times that it should do (maybe for each processor ? or there is a cache problem somewhere that use the disk because the geom is big ?).

Also, what takes the most time is the st_contains test in the Bitmap Heap Scan so you should remove that as well, you don't really seem to need it, you apparently want to select by distance from your geom.

Instead of creating a new geometry with ST_Buffer, you should try to use ST_DWithin to filter the data at a max distance from your geom, it should do what you're looking for, and don't have the st_contains nor the geometry creation :

WITH proj as (
  select geom from projecten where gid = 126
)
SELECT
  e.gid,
  e.geom,
  osm_id,
  name
FROM
  roads_vlaanderen_edges_2021_2022 e
  inner join proj p on ST_DWithin(p.geom, e.geom, 30)

Not sure if it will be faster in practice, but I think it's worth the try, sometimes postgis optimization is a bit like black magic...

robin loche
  • 2,465
  • 5
  • 9
  • Thanks for your answer!

    I've analysed the execution of the original query with SET track_functions TO 'all'; and SELECT * FROM pg_stat_user_functions; Apperently the ST_BUFFER was executed 10612 times. So after aplying the index the subquery was executed for each remaining row. I thought a subquery was only executed only once. So indeed, the postgis optimization didn't do its work. Is this a bug, or normal behaviour?

    Forcing it with WITH ... MATERIALIZED did the trick. ST_BUFFER was executed once.

    – Bart De Wit May 31 '23 at 19:07
  • Your second solution ST_DWithin was a little bit slower. For my case, and according to the statistics: every ST_CONTAINS execution takes +- 8ns, an ST_WITHIN takes +- 160ns. Every `ST_BUFFER' takes 4500ns. For (1 st_buffer and 10612 st_contains) or 10612 st_withins, the first one is the fastest.

    For completeness, the geometries are not that complex or big, and I've monitored my memory usage. No problems there. The CPU was at 99.9%

    Thanks again for your response.

    – Bart De Wit May 31 '23 at 19:08
  • 1
    Is it really ST_WITHIN that you use or ST_DWithin ? That's not the same function, so just to be sure if it's a typo or not. If it's really ST_DWithin, it's surprising that it's slower. Anyway, the fact that it's a full join type with each line potentially matching each other must be the reason for this, materialize could be the solution here (preferentially with a SSD), but maybe there is something to do with lateral join like a knn search instead of full join (if you don't know: https://gis.stackexchange.com/questions/155373/postgis-nearest-point-with-lateral-join-in-postgresql-9-3) – robin loche Jun 01 '23 at 08:22