I have a question on how to improve sorting performance on PostgreSQL on geospatial data. I want to sort restaurants based on their distance and price, but it's noticeably slower than my study on MongoDB.
My Query:
SELECT (distance + price) AS score, name
FROM (
SELECT geom <-> 'SRID=4326;POINT(0.0 0.0)' AS distance, price, name
FROM restaurants
WHERE ST_DWithin(geom, ST_SetSRID(ST_MakePoint(0.0, 0.0), 4326)::geography, 10000)
) AS t
ORDER BY score DESC NULLS LAST
LIMIT 1000;
My Index:
CREATE INDEX geom_idx on restaurants using GIST(geom);
ANALYZE;
When I run EXPLAIN ANALYZE, I got this result:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=457318.37..457393.17 rows=204 width=35) (actual time=564.937..724.273 rows=1000 loops=1)
-> Gather Merge (cost=457318.37..457393.17 rows=204 width=35) (actual time=564.935..724.241 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Result (cost=456318.35..456371.13 rows=102 width=35) (actual time=498.930..499.574 rows=614 loops=3)
-> Sort (cost=456318.35..456318.60 rows=102 width=59) (actual time=498.927..498.988 rows=614 loops=3)
Sort Key: (((restaurants.geom <-> '01010101010101010101010101010101010101010101010101'::geography) + (restaurants.price)::double precision)) DESC NULLS LAST
Sort Method: top-N heapsort Memory: 271kB
Worker 0: Sort Method: top-N heapsort Memory: 274kB
Worker 1: Sort Method: top-N heapsort Memory: 272kB
-> Parallel Bitmap Heap Scan on restaurants (cost=2774.30..456314.94 rows=102 width=59) (actual time=266.590..495.592 rows=9616 loops=3)
Filter: st_dwithin(geom, '01010101010101010101010101010101010101010101010101'::geography, '10000'::double precision, true)
Rows Removed by Filter: 9647
Heap Blocks: exact=18183
-> Bitmap Index Scan on geom_idx (cost=0.00..2774.24 rows=33293 width=0) (actual time=307.298..307.298 rows=57789 loops=1)
Index Cond: (geom && _st_expand('01010101010101010101010101010101010101010101010101'::geography, '10000'::double precision))
Planning Time: 0.218 ms
Execution Time: 725.379 ms
(18 rows)
While it quite fast, I'm surprised how big the loss between sorting by distance only and sorting by calculated values. It's also slower than my test on MongoDB.
Is there any way to reduce latency on when sorting rows from ST_DWithin query using calculated values like based on restaurant's distance and price?
CLUSTER restaurants USING <spatial_index_name>;and definitely run aVACUUM ANALYZE FULL! – geozelot Dec 02 '22 at 13:14CLUSTERonly available when all rows do not contains null and not all crawled restaurant data have geotag. This makes me wonder if i can separate the restaurants torestaurants_with_geotagandrestaurants_without_geotagand useUNIONoperation on other queries without performance penalty. Thank you – Yukha Dharmeswara Dec 03 '22 at 10:26NULLhandling is primarily managed by the index implementation used to cluster. Pages withNULLSwill get appended on disk. – geozelot Dec 03 '22 at 13:35