8

I was curious why clipping shapefiles was just so difficult and whether it was different to intersecting shapefiles?

You will see from one my questions that I gave up trying to clip a polyline with a polygon (here)

I tried various different methods:

  1. Conversion ogr2ogr (too slow)

    import subprocess
    subprocess.call(["C:\Program Files\GDAL\ogr2ogr", "-f", "ESRI Shapefile", "-clipsrc", clipping_shp, output_shp, input_shp])
    print('Done!')
    
  2. SQL ogr (too slow)

    from osgeo import ogr
    ogr.UseExceptions()
    ogr_ds = ogr.Open('K:/compdata/OS_Shape_Files/os_open_roads/trim', True)  # Windows: r'C:\path\to\data'
    start = time.clock()
    print('Starting:')
    print(ogr_ds)
    SQL = """\
    SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.*, B.*
    FROM ROADLINK_Project A, cut_se59ef_poly_60 B
    WHERE ST_Intersects(A.geometry, B.geometry);"""
     layer = ogr_ds.ExecuteSQL(SQL, dialect='SQLITE')
    # copy result back to datasource as a new shapefile
    layer2 = ogr_ds.CopyLayer(layer, 'h1_buf_int_ct3')
    # save, close
    layer = layer2 = ogr_ds = None
    print("Finished in %.0f seconds" % time.clock() - start)
    
  3. pyclipper (couldn't get this to work)

    solution = pc.Execute2(pyclipper.CT_INTERSECTION,     pyclipper.PFT_NONZERO, pyclipper.PFT_NONZERO)
    
  4. rTree (couldn't get this work as I have python 3.4)

    import fiona
    from shapely.geometry import shape, mapping
    import rtree
    
    bufSHP = produce_poly_shape
    intSHP = 'K:/compdata/OS_Shape_Files/os_open_roads    /trim/ROADLINK_Project.shp'
    ctSHP  = 'output.shp'
    
    start = time.clock()
    print('Starting')
    
    with fiona.open(bufSHP, 'r') as layer1:
    with fiona.open(ctSHP, 'r') as layer2:
        # We copy schema and add the  new property for the new resulting shp
        schema = layer2.schema.copy()
        schema['properties']['uid'] = 'int:10'
        # We open a first empty shp to write new content from both others shp
        with fiona.open(intSHP, 'w', 'ESRI Shapefile', schema) as layer3:
            index = rtree.index.Index()
            for feat1 in layer1:
                fid = int(feat1['id'])
                geom1 = shape(feat1['geometry'])
                index.insert(fid, geom1.bounds)
    
            for feat2 in layer2:
                geom2 = shape(feat2['geometry'])
                for fid in list(index.intersection(geom2.bounds)):
                    if fid != int(feat2['id']):
                        feat1 = layer1[fid]
                        geom1 = shape(feat1['geometry'])
                        if geom1.intersects(geom2):
                            # We take attributes from ctSHP
                            props = feat2['properties']
                            # Then append the uid attribute we want from the other shp
                            props['uid'] = feat1['properties']['uid']
                            # Add the content to the right schema in the new shp
                            layer3.write({
                                'properties': props,
                                'geometry': mapping(geom1.intersection(geom2))
                            })
    
    print("Finished in %.0f seconds" % time.clock() - start)
    

Finally, I looked at this but couldn't get it to work with python 3.4.

Then in R I tried:

  1. gIntersection(x,y, byid=TRUE)

  2. gDifference

  3. custom gClip command I found online

However with a large polylines shapefile (around 500mb) and a very small polygon shapefile (1mb) the intersection would take a day. This led me to think perhaps I am doing something incredibly stupid and there is a quick clip command?

For example, in arcGIS the intersection command takes a few seconds so surely clipping is even easier? I know very little about GIS but to me it seems as simple as aligning the two layers by one coordinate (assuming same projection) and then selecting the outside of the clipper shape (e.g. in paint) and just deleting it from the other layer. I guess obviously this isn't the case ..

Thus my goal: I create a polygon in my code and I want to clip the UK roads network I load in to the extent of that polygon, buffer the points, dissolve and then output as an array containing the polygon coordinates.

I don't need to retain any features that an intersection would provide just a straight clip. I guess I also don't strictly need shape files and could convert my 500mb UK road network into a geodatabase?

Ilia
  • 335
  • 2
  • 10
  • not sure that this is what you are looking for, but you should tak a look at concave hull http://gis.stackexchange.com/questions/1200/what-are-definition-algorithms-and-practical-solutions-for-concave-hull – radouxju Sep 25 '15 at 15:15
  • @radouxju, Radouxju, thank you for your reply! I tried creating a concave hull by piecing together a few functions I found online - and this may just be my code but I found that polygon would be too 'envelope/convex-hull'. For example: http://oi61.tinypic.com/15xr0w1.jpg – Ilia Sep 25 '15 at 15:35
  • Place multiple points along each road segment, assign time to each. Interpolate time using this points, convert to raster. Calculate eucledean distance to road, divide it by assumed walking velocity. Sum both rasters, reclass into desired intervals. The slower your pedestrian, the more alpha shaped polygons you'll get – FelixIP Sep 25 '15 at 21:16
  • Hi All, I think I have made some progress. I think alpha-shapes is the way to go for the boundary line. However, the Cgal library appears not to work with Python 3 so I have carried on with the code I have for creating alpha shapes. This works well when the points are in a circle shape: http://oi60.tinypic.com/243s385.jpg and I can specify an alpha of 500; however (1) sometimes the polygon breaks into multiple polygons and (2) drops certain points. Is it possible to automatically set the alpha as high as possible but low enough to avoid the two points above? – Ilia Sep 28 '15 at 11:41
  • Hi All, I have edited my original post with a completely different method to alpha-shapes which I would really appreciate help with – Ilia Oct 11 '15 at 09:46
  • 1
    GDAL must do it somehow wrong. I made a test with OSM roads (Switzerland set from geofabrik, 183 MB) and OpenJUMP makes the intersection in 18 seconds but I stopped ogr2ogr after 15 minutes. My clip polygon: POLYGON (( 7.801 47.06, 7.717 47.07, 7.709 47.132, 7.723 47.186, 7.775 47.205, 7.83 47.205, 7.874 47.177, 7.892 47.113, 7.882 47.061, 7.848 47.055, 7.801 47.06 )) – user30184 Oct 24 '15 at 15:00
  • Please clarify what you mean with clipping and intersection and what you think to be the difference between them? Are you perhaps mixing "intersection" that clips the features from the intersecting points and "intersects" that only selects the features as they are without clipping them. – user30184 Oct 24 '15 at 15:09
  • I agree with user30184. What result do you want that intersect does not provide? A picture showing a small example of what you want to achieve would be very useful. If Clip involves dissolving shapes, that may create huge geometries with too many vertices and too much complexity to process efficiently, depending on your input shapefile. I also personally never use shapefiles to do any geoprocessing. They are outdated and cause poor performance with large amounts of data. A file geodatabase is all I use for geoprocessing. The time spent converting a shapefile to geodatabase is worth it. – Richard Fairhurst Oct 24 '15 at 18:42
  • I updated my original post - I just want to keep the roads (lines) that fall within my polygon I.e clip road network to say a small village – Ilia Oct 25 '15 at 15:12
  • 2
    For the small village ogr2ogr will be much faster if you add -spat xmin ymin xmax ymax. Without that ogr2ogr does not utilize spatial index at all. You must naturally have spatial index, use shptree utility for creating .qix index if you do not have such. Did I understand right that you really want to clip so that if a long road is crossing the village border you want to cut the road geometry and only keep the section that is inside the village area? – user30184 Oct 25 '15 at 16:39
  • @user30184 Thank you so much for this! I can give a bounding box as I can figure out my min and max; latitude and longitude. I will try this now and see if I get faster results. You correct that I only want a clip (and so I want to cut the road); I just want the road inside the polygon. Is this the utility you mention: http://mapserver.org/uk/utilities/shptree.html If so, would I be able to attach some kind of index to my roads shape file that ogr2ogr will automatically read? – Ilia Oct 25 '15 at 18:07
  • 2
    That's the tool but it comes with GDAL so you have it already. Run it and .qix file appears and GDAL will utilize it. Coordinates for -spat are given in the same units than used in source data and not always as lon-lat. Take ogrinfo from your clip.shp and use extents in -spat. – user30184 Oct 25 '15 at 18:24
  • @user30184 Thanks again! You have really helped me out a lot with this. The -spat command does wonders. It only takes 5 minutes now (compared to hours before). The index doesn't really help (I think it saves me 30 seconds) but I'm not sure if I created it correctly. I have added some details from tests I ran stemming from this (as an answer) and would be very grateful if you could have a look through! – Ilia Oct 26 '15 at 10:28
  • I finally completed this task here: http://gis.stackexchange.com/a/167789/59826 – Ilia Oct 26 '15 at 15:28

2 Answers2

4

I wanted to post the below in-case it helps anyone. I managed to improve my clipping time considerably to 35 seconds by removing any holes in my polygon (fortunately I didn't need these holes anyway so a double-win).

The code below creates a rough polygon from points which are reachable within X minutes (by buffering); intersects them with a road network; then buffers the road-network slightly to create a nice isochrone. (Only thing I would like to add is a way to join the polygons along the road)

These match arcGIS generated isochrones quite well:

enter image description here

import subprocess
import fiona
from shapely.geometry import Point, mapping, shape, MultiLineString, MultiPolygon, Polygon, LineString
from shapely.ops import unary_union
from fiona.crs import from_epsg

# Projections:
# from pyproj import Proj, transform
# wgs84=Proj("+init=EPSG:4326")  # LatLon with WGS84 datum used by GPS units and Google Earth
# osgb36=Proj("+init=EPSG:27700")  # UK Ordnance Survey, 1936 datum
# original = Proj(init='EPSG:4326')
# destination = Proj(init='EPSG:4326')

# Create shape-file of Intersected
# Note try using dictreader
pts = []
with open(produce_csv_file) as f:
    for x in csv.reader(f):
        # Lat, Lng, Minutes and keep those points within
        if float(x[2]) <= float(duration):
            # Shapely:
            pts.append(Point(
                # transform(original,destination,float(x[1]), float(x[0]))
                float(x[1]), float(x[0])
            ))

# Buffer (divide spacing by 100)
# Buffer + 10%
buffer = [point.buffer(1.1*float(RADIUS_KM)/100) for point in pts]

# Union
merged = unary_union(buffer)

# Remove holes
exterior_polys = []
for poly_l in merged:
    exterior_polys.append(Polygon(poly_l.exterior))
merged = MultiPolygon(exterior_polys)

# Save the interim shape-file
schema = {
    'geometry': 'Polygon',
    'properties': {'id': 'int'},
}

with fiona.open(produce_poly_shape, "w", driver='ESRI Shapefile', crs=from_epsg(4326),
                schema=schema) as output:
    output.write({'geometry': mapping(merged), 'properties': {'id': 123}})

# Bounds
bounds = merged.bounds
x_min = '%.5f' % bounds[0]
x_max = '%.5f' % bounds[2]
y_min = '%.5f' % bounds[1]
y_max = '%.5f' % bounds[3]
print(x_min, x_max, y_min, y_max)

# Clip using GDAL
# http://www.gisinternals.com/release.php
print("Generation Time to Rough Shapefile: %.0f seconds" % (time.clock() - start))
print("Beginning Clip")
start = time.clock()

#Make sure shape-file has an index
#ogrinfo "../ROADLINK_Project.shp" -sql "CREATE SPATIAL INDEX ON ROADLINK_Project"
subprocess.call(["C:\Program Files\GDAL\ogr2ogr", "-f", "ESRI Shapefile", "-spat", x_min, y_min, x_max, y_max, "-clipsrc", produce_poly_shape, interim_poly_shape, my_roads_shape])

print('Clipped in %.0f seconds' % (time.clock()-start))

road_lines = ([shape(pol['geometry']) for pol in fiona.open(interim_poly_shape)])
buffer = []
# Buffer + 10%
for road in road_lines:
    buffer.append(road.buffer(1.1*float(RADIUS_KM)/100))

# Union
merged = unary_union(buffer)

# Remove holes
exterior_polys = []
for poly_l in merged:
    exterior_polys.append(Polygon(poly_l.exterior))
merged = MultiPolygon(exterior_polys)

# Save the final shape-file
schema = {
    'geometry': 'Polygon',
    'properties': {'id': 'int'},
}

with fiona.open(final_poly_shape, "w", driver='ESRI Shapefile', crs=from_epsg(4326),
                schema=schema) as output:
    output.write({'geometry': mapping(merged), 'properties': {'id': 123}})

print('Complete! Total time-taken: %.0f' % (time.clock()- overall_start))
Ilia
  • 335
  • 2
  • 10
1

@user30184 helped me to make pretty good progress and I want to share my results.

I save the bounds of my small clipper polygon:

# Bounds
bounds = merged.bounds
x_min = '%.5f' % bounds[0]
x_max = '%.5f' % bounds[2]
y_min = '%.5f' % bounds[1]
y_max = '%.5f' % bounds[3]

And then run the following command:

subprocess.call(["C:\Program Files\GDAL\ogr2ogr", "-f", "ESRI Shapefile", "-spat", x_min, y_min, x_max, y_max, "-clipsrc", clipping_shp, output_shp, input_shp])

Specifying --spat -> this makes my command much faster (more details below).

I tried to also create an index:

ogrinfo "../ROADLINK_Project.shp" -sql "CREATE SPATIAL INDEX ON ROADLINK_Project"

However, didn't notice any speed improvement. I think this is because I create the shape using arcGIS and thus ogr2ogr uses the arcGIS index. So I ran a few experiments (my .shp is 504MB, my arcGIS index .shx is 24MB and my crated quadtree .qix index is also around 25MB ... I wasn't sure what to set the recursion level to and so used the default)

1) Removing all index files and adding the default index using ogrinfo it takes 592 seconds to intersect a 500mb UK Roads shapefile with a 5mb polygon using the ogr2ogr command

2) Using arcGIS it takes 56 seconds to perform an intersection

3) Using arcGIS it takes 105 seconds to perform a clip

4) If I remove the index file and re-run the ogr2ogr command it takes 621 seconds (so it appears that the index file saves me 30 seconds).

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Ilia
  • 335
  • 2
  • 10
  • 1
    For testing the effect of spatial index with GDAL you should test one thing at a time. Run your command with -spat but without clip and you should feel the difference. GDAL can't still utilize spatial index for clipping so it is testing each linestring that is left after applying -spat with the clipping polygon. That is much extra work because lots of linestrings and their bboxes are totally within your village polygon and there is for sure no need to clip them. You know that 600 seconds is not a good time if another program does the job in 60 seconds. It is up to you if you can tolerate it. – user30184 Oct 26 '15 at 11:50
  • @user30184 I think I realised what was slowing this down (the 600 seconds) -> my multipolygon object has lots and lots of holes (after I buffer points) which mean a lot more intersecting needs to be done. Is there a way in shapely to fill all holes for all polygons in my multipolygon object? – Ilia Oct 26 '15 at 13:48
  • Could you create a convex hull with shapely? – user30184 Oct 26 '15 at 14:02
  • @user30184 A convex hull I think would not be particularly suited for my analysis as if a point is outside the polygon I do not want it included; however I have managed to fill in the holes using this: "# Remove holes exterior_polys = [] for poly_l in merged: exterior_polys.append(Polygon(poly_l.exterior)) merged = MultiPolygon(exterior_polys)" And now my clipping takes 50 seconds! – Ilia Oct 26 '15 at 14:14