Get closest road-segment to segmentized LineString points

Question

So I have a table that stores a bunch of traffic counts for various road segments across the country. I'm regularly getting real-time location data from vehicles that are moving around a given region and I'm trying to match the segments that correlate to the path it traveled. In order to fill in some gaps in where the vehicle has traveled, I'm using Mapbox's Matching API to snap to roads. This is good for getting a general route that the vehicle may have traveled, and is returned to us in an encoded polyline. In PostGIS I'm able to decode this polyline, use ST_Segmentize to fill in any gaps between 'snapped-to-roads' points (to avoid missing any small segments) and loop over these points to find the closest segment geometry they match to. This is fine, and it technically works, but it's really slow. I'm hoping to find a faster/better way of doing this.

Edit: In case anyone was wondering, the reason we break the segments into distances of no-greater-than .000032 is to make sure we capture even the smallest road-segments when breaking up the encoded polyline.

This is what I have so far (apologies for bad formatting):

    CREATE OR REPLACE FUNCTION findASegment(polyline text)
      returns table(t_index float4, t_xdsegid TEXT, t_distance DOUBLE PRECISION) as $$
    DECLARE
      node     text;
      iterator float4 := 1;
    BEGIN

      CREATE TEMPORARY TABLE temp_table (
        t_index    float4,
        t_xdsegid  TEXT,
        t_distance DOUBLE PRECISION
      ) ON COMMIT DROP;

      FOR node IN (SELECT ST_AsText((dp).geom) as node
                   FROM (SELECT (st_dumppoints(
                         ST_Segmentize(st_linefromencodedpolyline(polyline),
     .000032))) AS dp) AS foo)

      LOOP


        INSERT INTO temp_table (t_index, t_xdsegid, t_distance)
    select iterator as t_iterator, xdsegid, distance
        FROM (SELECT bar.xdsegid, st_distance(geom, st_geomfromtext(node)) as distance from alabama as bar) as baz
        ORDER BY distance ASC
        LIMIT 1;
        iterator := iterator + 1;
      END LOOP;

      RETURN query SELECT t.t_index, t.t_xdsegid, t.t_distance FROM temp_table t;

    END; $$

    language plpgsql;

    SELECT *
    FROM (SELECT distinct on (t_xdsegid) t_xdsegid, t_index
          FROM findASegment('{hhkEzz|pOGQq@cC')
          order by t_xdsegid, t_index) as p
    ORDER BY p.t_index;

Here are some pictures as per-request:

So we initially are given two sets of lat/long coordinates. We use MapboxMatching to fill in points on a road that a vehicle could have traveled to get from point A to point B. Mapbox returns us an encoded polyline that looks like this on a map:

But, the issue with this, is that sometimes those points in the encoded polyline are too far apart. They may miss some of the smaller road-segments along that path. So we segmentize it to interpolate additional points into it.

Now it looks more like this:

Then I want to get all the possible segments it passed through from the geometry table. Each row in the table looks like this:

The reason we use distance to get the nearest segment from the table is because the snapped-to-roads geometries don't necessarily match up to the stored geometries. The geometries in the database may veer off the road slightly, making it to where contains isn't viable (to my knowledge). Hope this helps

the JOIN LATERAL in combination with the spatial index powered KNN operator <-> would be the general pattern for an effektive Nearest Neighbor search. Check this answer for a working example that can easily be adopted. — geozelot, Sep 19 '18 at 17:22
@ThingumaBob would this be done inside of the loop or instead of the loop? — Fell, Sep 19 '18 at 17:33
@Fell that could completely replace the whole function; in this setup, the lateral join provides a similar functionality as a for-each loop, implemented in plain SQL. I'd add an answer, if I had some time...adding a picture would definetely help to understand what you're after. — geozelot, Sep 19 '18 at 17:44

geozelot · Accepted Answer · 2018-09-20T05:54:54.517

This is referred to as the (K) Nearest Neighbor search; the linked answer and plenty similar on this board will have more details about it and its usage in PostGIS, including some possibilities to increase performance further.

For now I assume there is, at least, a spatial index on the <segments_table>:

WITH
  points AS (
    SELECT (ST_DumpPoints(
             ST_Segmentize(ST_LineFromEncodedPolyline(ml.geom)::geography, <distance_in_meter>))
           ).*
    FROM <mapbox_line> AS ml
  )

SELECT *
FROM (
  SELECT DISTINCT ON (seg.<id>)
         seg.<id>,
         pts.path,
         ST_Distance(pts.geom::geography, seg.geom::geohraphy) AS distance
  FROM points AS pts
  JOIN LATERAL (
    SELECT st.<id>,
           st.geom
    FROM <segments_table> AS st
    ORDER BY pts.geom <-> st.geom
    LIMIT 1
  ) AS seg
  ON true
  ORDER BY pts.<id>
) q
ORDER BY path;

This will find the closest segment to each point, with the DISTINCT ON excluding duplicates, and ordered by the path of the dumped points (the outer query is necessary because DISTINCT ON requires its field to be first in the sort order)

The <-> operator should use the one index at least (since the points are not index searchable, non-SARGable), speeding things already up a lot.

As I assume you are working with Lon/Lat (.000032 degrees?), I also used a cast to geography; that increases overall precision and let's you define the segmentation interval in meter (at a tiny cost in speed); note, this will also assume those Lon/Lats to be in EPSG:4326. I also added that to the distance calculation, again for precision and meter as return unit. Just reverse all that to your version if you have your reasons.

No function involved here, but you could add it to one like yours easily; just replace the loop.

I hope I haven't overseen something.

This seems to work very well and much faster than the solution I initially tried! I only have one question, and please forgive my Postgres ignorance, but how do I SELECT blah FROM <a string (mapbox-line)>? — Fell, Sep 20 '18 at 16:27
Nevermind, I think I figured it out. It would look like this, right?
WITH points AS ( SELECT (ST_DumpPoints( ST_Segmentize(ml, 0.00032)) ).* FROM ST_LineFromEncodedPolyline('xxx') AS ml ) — Fell, Sep 20 '18 at 16:48
@Fell mh yes. but if you simply pass in one string, you can put that part within the ST_DumpPoints(...) as you did in your query. the FROM block simply allows for table references — geozelot, Sep 20 '18 at 17:17

Get closest road-segment to segmentized LineString points

1 Answers1