3

I have some thousands of houses/addresses (points) and a road network and I would like to build clusters of defined minimum to maximum cluster size not using euclidean distance as a metric, but the distance between the addresses along the road network (where I'm not sure if I'm using the word 'cluster' in the correct context here).

I read this paper http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.6037&rep=rep1&type=pdf and had a look at PostGIS function ST_ClusterDBSCAN, which is unsuitable for my task because DBSCAN algorithm is not intented to limit cluster size nor is it possible to define a non-euclidean metric for the input data.

K-Means approach might not be suitable, because the numbers of clusters is not defined a-priori due to varying count of addresses within a cluster.

Everything seems to point me to hierachical clustering (I had alook at How to cluster points into clusters of a maximum diameter in PostGIS?), but by now I could not find a solution taking the metric along a spatial network into account.

For better understanding ref. to https://youtu.be/iQvpPl-J7b4?t=328 , the guys do something similar to what I'm after.

Are there any open solutions to this?

Jochen Schwarze
  • 14,605
  • 7
  • 49
  • 117

1 Answers1

2

I use the query construct at the end of the answer to assign census data to parts of the small transect pieces and adopted it to your context. IMO the st_distance and st_shortestline will do the right job also in a LINESTING <-> POINT context. The expression:

SELECT st_distance(st_point(0,0), 
       st_makeline(st_point(-1,-1), st_point(1,1)));

gives you the distance zero. And

SELECT st_astext(
         st_shortestline(st_point(0,0), 
                st_makeline(st_point(-1,-1), st_point(1,1))));

      st_astext      
---------------------
 LINESTRING(0 0,0 0)

I think (not shure) it is important to work with road segments. Here is the Pseudo-code. The create table statements are only given to understand the query.

-- --------------------------------------
-- Structure for the adresses
-- --------------------------------------
CREATE TABLE addresses (
  pk SERIAL PRIMARY KEY,
  zip VARCHAR(32),
  name VARCHAR(32),
  street VARCHAR(32),
  number VARCHAR(32)

);

-- --------------------------------------
-- Structure for the  road segments
-- --------------------------------------
CREATE TABLE roads (
  pk SERIAL PRIMARY KEY,
  rid int -- part of ROAD 
  name VARCHAR(32)
);

SELECT AddGeometryColumn('addresses', 'geom', 32633, 'POINT', 2);
SELECT AddGeometryColumn('roads', 'geom', 32633, 'LINESTRING', 2);

-- INSERT SOME ADDRESS & ROAD NETWORK STUFF
-- ....

-- --------------------------------------
-- SELECT MAGIC
-- --------------------------------------
SELECT
DISTINCT ON
      (a.pk) a.*,
      b.pk,
      b.rid,
      st_distance(a.pos, b.geom) AS distance,
      st_shortestline(a.pos, b.geom) AS geom
FROM  addresses a,
      roads b
WHERE
      distance < :MAX_DISTANCE
ORDER BY
      a.pk, st_distance(a.geom, b.geom);

To group, count and aggregate the "measured stuff" (points) into a cluster, you could embed this query and apply filter and group expressions of your needs (group by rid for example).

huckfinn
  • 3,583
  • 17
  • 35