3

I am new to PostGIS.

How do I calculate the distance and direction between every site (approx 30,000 record) and the nearest entity in a gazetteer (approx 300,000 records)?

Ideally a separate table would be created that contains the site_id, entity_id, distance between, direction between (entity & site). I have looked at an answer supplied by Tobias Herrmann in late 2014 - paraphrased below.

This code was written to do the calculation of distance, but is a work in progress.

SELECT site.gid AS gid_1, gazetteer.gid AS gid_2, ST_Distance(site.geom, gazetteer.geom) AS mindist
FROM table site, table gazetteer WHERE site.gid != gazetteer.gid AND ST_Distance(site.geom, gazetteer.geom) != 0
ORDER BY ST_Distance(site.geom, gazetteeer.geom)
LIMIT 1;
PolyGeo
  • 65,136
  • 29
  • 109
  • 338
  • 3
    this might be dangerously close to shameless self promotion, but I will leave some links to some of my own answers here, as thy IMHO do a good job at explaining the concept and, @HeikkiVesanto, represent the 'modern' way using the wonderful <-> operator. technically, this question is a close duplicate to most of these linked Q/As: https://gis.stackexchange.com/a/338328/93656, https://gis.stackexchange.com/a/333032/93656, https://gis.stackexchange.com/a/332103/93656, https://gis.stackexchange.com/a/278362/93656, https://gis.stackexchange.com/a/279397/93656 – geozelot Oct 30 '19 at 12:30
  • this question is definitely a duplicate – ziggy Oct 30 '19 at 13:29

1 Answers1

3

Modified from:

PostGIS nearest point with LATERAL JOIN in PostgreSQL 9.3+

Lateral joins really are the best way to do nearest searches in PostGIS.

create table nearest_w_dir as
select
a.gid as site_id, ij.entity_id, ij.distance_between, ij.direction_between
from site_table a
left join lateral 
(select b.gid as entity_id, ST_Distance(a.geom, b.geom) as distance_between,
ST_Azimuth(a.geom, b.geom) as direction_between
from gazetteer_table b
order by a.geom <-> b.geom asc nulls last
limit 1
) ij on true;

You can read up on the direction from:

https://postgis.net/docs/ST_Azimuth.html

HeikkiVesanto
  • 16,433
  • 2
  • 46
  • 68
  • 1
    Or indeed this answer, see edit 2. – John Powell Oct 30 '19 at 13:39
  • One big thing to note: adding a second limiting expression (e.g. ST_DWithin, a.geom && ST_Expand(b.geom, ...) adds significant overhead to the highly optimized <-> distance order. Don't use that syntax anymore! – geozelot Oct 30 '19 at 13:55
  • Thanks @JohnPowell I knew there was a good lateral join answer somewhere, but din't come up in the search. – HeikkiVesanto Oct 30 '19 at 14:06
  • @ThingumaBob. When was that fixed, ie, which version of Postgres (as it is Postgres, not Postgis functionality, correct)? It always used to be the case that unless one side of the <-> operator was a constant, you needed something like ST_DWithin to limit it, such as the case when you have a.geom <-> b.geom. – John Powell Oct 30 '19 at 15:16
  • @JohnPowell not sure if anything changed at all, except maybe the interpretation of 'constants'; after all, the LATERAL JOIN passes a row to it's right hand expression (query), which is, for that very instance of it, considered as a constant (those expressions/queries are executed once per left hand row, so there's really nothing to plan ahead for the planner on an individual subquery level); I, too, doubt it's a PostGIS improvement, but I can check the logs for any mentioning. honestly, I always had the impression the extra limitation expression adds overhead, at least since 9.x (5?) – geozelot Oct 30 '19 at 16:17
  • @Thingumabob. You may be right. I have done extensive experiments on quite large tables and found that ST_DWithin usually helped, unless one side was a constant, by which I mean an explicitly defined geometry, such as ST_MakePoint of ST_GeomFromText would give you. Using a lot of EXPLAIN and head scratching, I never really did come up with a clear answer one way or another. You know how query optimization can be a bit of a black art with Postgres. – John Powell Oct 30 '19 at 16:27
  • Did a test with st_dwithin 100, and st_distance vs a.geom <-> b.geom with 70k sites, and 4mil to search from. Both took 7 seconds. Increasing the search radius to 1000 took 1 minute. I am surprised <-> is so quick. I thought it would have to compare every geometry in the table. – HeikkiVesanto Oct 30 '19 at 16:59
  • 1
    Edited answer to use <-> – HeikkiVesanto Oct 30 '19 at 17:09
  • 1
    @HeikkiVesanto actually, the whole point of that operator is that it doesn't; it will look up the index of the running table to preselect geometries, and calculates distance only to those where bbox comparison cannot decide. this does only work if it is used as the ordering clause, otherwise it will just calculate plain distance. using any other means of limiting the geometries to calculate distance on (e.g. ST_DWithin), together with e.g. ST_Distance, semantically does the same thing, however the size of the result set to calculate on is bound to what is found within the threshold – geozelot Oct 30 '19 at 17:15
  • Thanks to HeikkiVesanto particular, I have run the query and generated some output, which I am now trying to make sense of the results. To add context, I am looking to create a reverse geocoder for museum records of specimens collected in rural & remote locations in Australia. The aim is to produce a location description something along the lines of “5 km north-west of Mt. Willis, Victoria” – D.W. Murray Oct 31 '19 at 03:21