1

I need to cluster point layer features based on an attribute so each cluster has a similar total sum of this attribute values:

enter image description here

E.g. I have generated 1000 random points with values between 1-100. Task: Cluster these points so each cluster has a sum of values similar to 1000.

In QGIS I have tries several clustering processing tools (K-means, DBSCAN, ST-DBSCAN) and plugins ( Attribute based clustering; ClusterPoints; QGIS Scipy Clustering - no compatible with QGIS 3) but none of these tools/plugins enables the option to cluster the features based on similar sum of an attribute. I suppose PyQGIS is needed for this but I am missing an idea of the workflow to implement this task.

Another (similar) task will be how to cluster these point geometries so the sum of an attribute (numeric) is maximally 1000 (no overrun). Because of the similar approach (question character) I did not create a new question for this task.

Any suggestions/ideas how to calculate this?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Mapos
  • 567
  • 4
  • 12
  • Can the clusters overlap? – Matt Apr 12 '23 at 12:46
  • @Matt No. Based on the clusterID assignment polygons for further calculations will be generated. The areas cannot overlap therefore. – Mapos Apr 12 '23 at 12:50
  • 1
    @BERA I think I get your point - clustering based on exactly identical sum values will not be possible (question modified). So yes, I want to get as close as possible. – Mapos Apr 12 '23 at 13:37
  • Sounds to me like an intractable problem. Are you giving a number of clusters to start with? – Micha Apr 12 '23 at 15:21
  • A similar question: https://gis.stackexchange.com/q/433700/88814 – Babel Apr 12 '23 at 15:47
  • I think this should be two questions - one for each of your tasks. – PolyGeo Apr 21 '23 at 04:43
  • 1
    There's a fair bit of information that you haven't specified in the question. For example, how well clustered do the points need to be, or is the sum the only thing that matters? And you've said "similar to 1000", but that could mean anything. If you do K-means clustering using the K ~= sum(values)/1000, probability suggests that you'll get a good number of clusters with values close to 1000. Not a bad starting point. But what's the furthest that's acceptable? You're trying to do an optimisation which doesn't neatly converge. So you need to set parameters. – Tom Brennan Apr 21 '23 at 08:32
  • @Micha Number of clusters is predefined, yes. – Mapos Apr 27 '23 at 19:49
  • @TomBrennan The points need to be clustered based on the spatial proximity (close to each other) because based on the cluster ID, polygons are created as next step. Regarding the sum, identical values are of course not possible. Lets say with 1,000 points the sum is 10,000. A suggestion would be e.g. to stop generating a cluster if the nearby points exceed the sum of 10,000 / 1,000 = 100 and then cluster another subset of points. – Mapos Apr 27 '23 at 19:57
  • This question has a possibly useful approach: https://gis.stackexchange.com/questions/123289/grouping-village-points-based-on-distance-and-population-size-using-arcmap/123297#123297 – Tom Brennan Apr 27 '23 at 22:16

0 Answers0