What is the best approach to create a create a very large shapefile (close to the max of 2GB) using python? Best approach means: robust and fast. Above all fast.
-
Why the downvote? – LarsVegas Aug 14 '13 at 13:41
-
10May be because of lack of information in your question? IMHO best practice in your case is NOT TO USE shp-files at all. Use spatial databases for such amount of data. – SS_Rebelious Aug 14 '13 at 13:48
-
1Out of what will you create this shapefile? Surely there are some inputs or specifications--otherwise, the fastest way is to use an operating system call to copy an existing 2 GB shapefile :-). – whuber Aug 14 '13 at 15:23
-
1Haha! Good one. Well, this question was intended to be open. There is one specific use case right now (~700.000 point geomtries with two attribute columns (strings)) but I was interested in general thoughts about fast ways to create a shapefile from scratch. – LarsVegas Aug 14 '13 at 15:56
-
1@SS_Rebelious, in many cases I agree, since they have several limitations. However, in simpler circumstances shapefiles are still useful, e.g supported by numerous systems, quick & easy to handle etc. – Oyvind Aug 14 '13 at 17:31
-
For what it is worth, here are some short examples to create Shapefiles with GDAL/OGR or Fiona. – Mike T Sep 12 '13 at 23:45
3 Answers
I want to take the chance of promoting OGR's virtual file system that writes geometries to a in-memory dataset.
Using it is simple as @Luke demonstrated in this post
drv = ogr.GetDriverByName( 'ESRI Shapefile' )
ds = drv.CreateDataSource(r'/vsimem/virtual.shp')
This works just great. Creating a point shape file with ~300.000 geometries and two attribute fields takes less than two seconds. Impressive.
-
If your dealing with a really large shape file won't this get slow once it eats all your memory and swaps? – Justin Dearing May 26 '15 at 15:05
A slow method is updating the shapefile, on disk, for each record. I would look for a Python library that allows you to create an in-memory layer, then write the shapefile to disk in a single function or transactions. Fione/shapely may offer something. Also, ogr2ogr will create a shapefile from csv, or another shapefile, etc. Ogr2ogr is written in C or C++ so it should be fast, but is not a pure Python solution. You'll want to define the shapefile attributes as tight as possible to keep the shapefile as small as possible.
- 7,475
- 17
- 19
If Python is desired for this task, then the GDAL/OGR Python bindings is probably the best free option.
See HERE for a python example showing how to create/write to a shapefile.
- 1,061
- 7
- 14