14

I have a vector layer with millions of polygons making continuous coverage. I need to classify them according to their shape. I am already using several shape indexes from landscape ecology like compactness (4piA/P^2), mean width (2A/P), shape number (P/sqrt(A)), I also saw this answer to Calculating roundness/compactness of polygon?

My problem is that all these metrics are using some ratio of area and perimeter only. Even the Fractal Dimension index is using only area and perimeter (2ln(0.25P)/ln(A)). But how can I distinguish two polygons with the same area and perimeter but absolutely different shape? Like this branched polygon A:

branched polygon vs curved strip

which I tried to draw with the same area and perimeter as curved strip B. All my known indexes will be the same for them. But for me it's very important to differentiate simple strips (including curved like new moon) from complex branched shapes.

I intentionally show the polygon B as a curved strip and not a straight strip because I am aware of Related Circumcircle index which detects straight elongated shapes but my polygons may have also the same circumcircles. Even if I construct Convex hull and calculate a ratio of areas Apolygon/Aconvex, it may be very similar here.

So, how can I clearly distinguish branched polygon A from polygon B in vector data automatically? (Converting them to raster would require extremely small cell size, enormous dataset and a lack of memory, so it is not possible). Are there other shape indexes which include other parameters? Ideally, the method would distinguish not only clearly branched polygons but even C and D:

enter image description here

My only idea is to construct the convex hull then erase the polygon from it's convex hull and count the number of (big) pieces it leaves (erasing polygon by polygon and not the entire layer). This could show the border complexity.

I welcome mathematical solutions/algorithms, which I would later implement in Python.

mgri
  • 16,159
  • 6
  • 47
  • 80
nadya
  • 2,471
  • 1
  • 23
  • 35
  • 1
    You don't need much Python. Try ! shape!. convexHull (). symmetricDifference(!shape!) In field calculator. Try first on a small subset copy. See arcpy geometry help for correct syntax. – FelixIP Sep 06 '17 at 19:09
  • This could be a great question but at the moment you are asking multiple questions by not tying down whether you are asking about QGIS or ArcGIS Desktop and then also throwing in Python too. Once you specify precisely what you have tried it is easier for potential answerers to help with where you are stuck. I suggest focusing it on QGIS to avoid stranding your first answer. – PolyGeo Sep 06 '17 at 20:58
  • 1
    I have data in Esri geodatabase because a shapefile already was exceeding 2 gb. I could do something about this if there is a working solution in QGIS or somewhere. But I am not asking inside a specific software. I am asking about a metric, a method how to mathematically detect a shape with complex border (branched). 1 question. Scientific article with a formula would be also ok, I will think how to implement it myself. – nadya Sep 07 '17 at 10:25
  • Your last edit made this question even broader. For my vote to re-open I recommend that you focus it on QGIS so that you do not strand the existing answer. You can always ask about just an algorithm in a separate question. https://gis.meta.stackexchange.com/a/4339/115 – PolyGeo Sep 07 '17 at 11:18
  • @PolyGeo Sorry I don't understand. Should I mark this question as for QGIS only and create the same-text question as for algorithm only? (Actually QGIS is not good for me. I would need to create and process a shapefile of ~5 GB from ESRI geodatabase) – nadya Sep 07 '17 at 15:08
  • Yes - the initial broad phrasing meant that a QGIS answer was provided in good faith. If QGIS is no longer something that you want to ask this question about then just leave it as a Q&A that shows where you abandoned the effort to use it and start a specific question that asks about the algorithm only. – PolyGeo Sep 07 '17 at 20:51
  • @PolyGeo I can mark this question as Python question. This answer includes Python as well, and I will be glad to try to implement any algorithm in Python. Will be this ok? Thanks! – nadya Sep 07 '17 at 20:56
  • Personally I would always ask a question focused on a particular GIS software from the outset so that I can begin by saying not just what I want to do, but what I have tried and where I am stuck. A question seeking Python and/or algorithm solutions remains broad rather than focused but hopefully moves towards being focused without too much cross-answering. I'll vote to re-open for now to let that take its course. – PolyGeo Sep 07 '17 at 21:09
  • The [tag:python] tag does not encourage theoretical answers - that is what tags like [tag:algorithm] and [tag:gis-principle] are for. Since many product teams provide Python APIs and libraries the [tag:python] tag is itself very broad. – PolyGeo Sep 08 '17 at 00:14
  • Great question. Might be something useful in FRAGSTATS: http://www.umass.edu/landeco/research/fragstats/documents/fragstats.help.4.2.pdf – alphabetasoup Sep 08 '17 at 02:18
  • This is now a great question that I have upvoted. I wish we had more theoretical questions because once dissociated from implementation details they can attract great theoretical answers for others to implement using the software of their choice. – PolyGeo Sep 08 '17 at 02:37
  • @Richard Law Thanks but I am aware of FRAGSTATS and have cited this site right up in my question. It works with rasters only. They do have a different index (Contiguity index) but it is using moving window 3x3 pixels and cannot be applied to vector data. – nadya Sep 08 '17 at 17:20
  • 1
    My first thought was the same as yours, looking at the differences between the number and size of polygons left after subtracting the original from its convex (or concave) hull (see also alpha shapes). – user2856 Sep 11 '17 at 10:32
  • @FelixIP Thank you a lot for this code suggestion, it really works when I add .partCount. I can also write a Python code block with a condition to count only parts bigger than certain area or with compactness more than certain value. This can be a fast and easy way to roughly estimate "branchness" – nadya Sep 11 '17 at 17:52
  • 1
    If only skeleton was fast enough to compute I'd use to to compute 4A/PL, area, perimeter, length between farmost nodes of skeleton for compactness. Same applies to largest inscribed circle. – FelixIP Sep 11 '17 at 19:24

2 Answers2

13

You could have a look at the following method : skeletonize your polygons and rather work on line type features related to your original polygon with a unique source polygon ID. I guess there's some guesses to do (for example, when to consider a polyline as a real centerline : minimal length for a polyline to be eligible to centerline status). When the number of centerline is more then 1 for a single source polygon, then it's branched.

A branched polygon, when cleaned up to a center line, will have multiples lines whereas a straight polygon might have only one big line in the center (same as human interpretation in fact).

Example :

  • when you draw a Y letter, you use at least 2 continuous strokes (= 2 polylines),, so it's branched because the minimal number of strokes is > 1.
  • when you draw an L letter, you use at least 1 continuous stroke. It's not branched.

More examples of this logic :

  • When you draw a A : 2 strokes = it's branched
  • When you daw a B : 3 strokes = it's branched
  • when you draw a C : 1 stroke = it's not branched
  • etc

I haven't tried anything, just trying logic, but I think it could work.

See : Skeletonize vectors in QGIS/Python or http://postgis.net/docs/ST_StraightSkeleton.html

Or

Example

Source: Extracting centerline of a Complex-Polygon in PostGIS/Python

EDIT : For cases C & D, you need to already have filtered B shapes (non branched).

  • Make sure a unique ID links the centerline and the source polygon.
  • Transform your polygons into polylines
  • Densify the centerline polyline and the borderline polyline with regular points (not too much to avoid memory problems later but enough to "catch" the irregular bits.
  • Create a distance matrix between points of the centerline and the points of the borderline
  • Keep in the matrix lines only those where ID_centerline = ID_borderline
  • Create statistics to have a standard deviation value
  • Set a bound value to indicate for high SD values that it's a non regular contour and create the required indicator, for each unique ID
  • Get back the indicator to the original polygon by joining the field on the base of the unique ID.
gisnside
  • 7,818
  • 2
  • 29
  • 73
  • Thank you for the idea, I will try to create centerlines – nadya Sep 06 '17 at 16:43
  • Just, the problem to distinguish my polygons C and D will remain – nadya Sep 06 '17 at 16:47
  • You might need different methods for different cases and split the work. Once you have non branched polygons (B), you can refine B to try and find C and D. The problem is that i don't see what logic you use to distinguish C from D. you will probably have to phrase it clearly with criterias. – gisnside Sep 06 '17 at 17:18
  • 1
    The difference between C and D seems to be that in C, the sides of the polygon are approximately a uniform distance from the centerline, while in D the sides are a non-uniform distance from the centerline. – csk Sep 06 '17 at 18:22
  • 1
    @csk I see that. I guess translating that into code would be calculate statistics on the distance between centerline and border line. By densifying the border polyline with more points then converting this border into points + making a distance from the equivalent job on the centerline would give statistics on this behaviour. If the standard deviation is high, then probably the shape will be irregular. Hard to see how to do that on tousands of polygons though...nice challenge there – gisnside Sep 07 '17 at 05:04
  • Yes I was also thinking about distance to centerline. If there is no easy index or method, I will need to program in Python for every polygon. – nadya Sep 07 '17 at 10:32
  • @nadya I edited my answer to add the previous method – gisnside Sep 07 '17 at 12:53
  • Thank you for the detailed suggestion. This method seems reasonable. However I am having troubles with implementing - creating correct centerlines for my big data. Thiessen method (with densifying borders) creates way too much segments for polygons I consider linear: Picture. And the minimal valid length would be different for polygons of different size. – nadya Sep 11 '17 at 11:24
  • The result is not bad at all :) Maybe you could ponder the minimal length to select depending on the source polygon size and/or with the total linear length for each area ? Maybe you need to create a topological layer to make sure close linear lines are connected together and avoid removing consecutive lines – gisnside Sep 11 '17 at 12:14
2

In addition to the indices you mention and the excellent skeletonization idea, there are a few more indices that could be interesting if you can calculate buffers and convex hulls easily:

  • Fullness index: ST_Area(ST_Buffer(geometry, 0.177245385 * sqrt(ST_Area(geometry)))) / ST_Area(geometry)
  • Depth index: ST_Area(ST_Buffer(geometry, -0.177245385 * sqrt(ST_Area(geometry)))) / ST_Area(geometry)
  • Concavity index: 1 - (ST_Area(geometry) / ST_Area(ST_ConvexHull(geometry)))
  • Detour index: 3.5449077 * sqrt(ST_Area(geometry)) / ST_Perimeter(ST_ConvexHull(geometry))

(the expressions work for PostGIS and are taken from https://github.com/simberaj/rum/blob/master/rum/util.py)

For reference and explanation, you might want to see this article on compactness properties.

Jan Šimbera
  • 1,314
  • 9
  • 12