2

I have a query that checks a lat,lon pair with each coordinate column in a PostgreSQL table. Then gets the shortest distance like so:

SELECT A.id,
    A.name,
    similarity(A.name, %s) AS sim,
    st_distance(coords, ST_SetSRID(ST_MakePoint(%s, %s), 4326), true) AS dst,
    (ST_AsGeoJSON(A.coords))::jsonb->>'coordinates' AS coordinates
FROM location_table AS A 
WHERE coords IS NOT NULL
ORDER BY (
st_distance(coords, ST_SetSRID(ST_MakePoint(%s, %s), 4326), true)
) ASC
LIMIT 1;

This is very inefficient. It takes more than 5 seconds to finish. And I want to run this for a large number of coordinate set. Is there a way to improve this query? I do not have to use these built-in functions specifically. Anything will do as long as it gives me back a single row from the location_table which has the shortest distance between coords column and the input coordinates.

update

The EXPLAIN ANALYZE output:

    Limit  (cost=4984153.25..4984154.62 rows=1 width=67) (actual time=20246.329..20303.854 rows=1 loops=1)
   ->  Gather Merge  (cost=4984153.25..5530657.07 rows=397696 width=67) (actual time=18542.905..18600.409 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Result  (cost=4983153.22..10207884.42 rows=198848 width=67) (actual time=18116.951..18118.225 rows=1 loops=3)
               ->  Sort  (cost=4983153.22..4983650.34 rows=198848 width=67) (actual time=18109.756..18111.025 rows=1 loops=3)
                     Sort Key: (st_distance((coords)::geography, '0101000020E61000009173CFC36D5505408C0A5D53FA5E4840'::geography, true))
                     Sort Method: top-N heapsort  Memory: 25kB
                     Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
                     Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
                     ->  Parallel Seq Scan on location_table a  (cost=0.00..4982158.98 rows=198848 width=67) (actual time=1967.735..17607.239 rows=158866 loops=3)
                           Filter: (coords IS NOT NULL)
                           Rows Removed by Filter: 34
 Planning Time: 5.090 ms
 JIT:
   Functions: 19
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 69.457 ms, Inlining 1551.634 ms, Optimization 3516.036 ms, Emission 2041.415 ms, Total 7178.541 ms
 Execution Time: 20538.702 ms
Vince
  • 20,017
  • 15
  • 45
  • 64
bcsta
  • 191
  • 5

1 Answers1

1

Thanks to Ian's comments I tried out replacing the st_distance function with <-> operator for distance as suggested. It turned out this speeds up the query by a very significant amount by using spatial indexes.

                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..2.91 rows=1 width=67) (actual time=7.406..7.411 rows=1 loops=1)
   ->  Index Scan using location_table_coords_idx on city a  (cost=0.29..1253106.30 rows=477234 width=67) (actual time=7.397..7.399 rows=1 loops=1)
         Index Cond: (coords IS NOT NULL)
         Order By: (coords <-> '0101000020E61000009173CFC36D5505408C0A5D53FA5E4840'::geometry)
 Planning Time: 0.878 ms
 Execution Time: 7.615 ms
(6 rows)
bcsta
  • 191
  • 5
  • Don't forget to cast to geography, else the result will be wrong, as <-> uses planar arithmetic in the CRS unit, so it would compute distances in "degrees" which is meaningless (a degree of latitude doesn't have the same ground length as a degree of longitude) – JGH Jan 24 '23 at 14:01