2

What is the best approach to create a create a very large shapefile (close to the max of 2GB) using python? Best approach means: robust and fast. Above all fast.

LarsVegas
  • 2,536
  • 3
  • 30
  • 47
  • Why the downvote? – LarsVegas Aug 14 '13 at 13:41
  • 10
    May be because of lack of information in your question? IMHO best practice in your case is NOT TO USE shp-files at all. Use spatial databases for such amount of data. – SS_Rebelious Aug 14 '13 at 13:48
  • 1
    Out of what will you create this shapefile? Surely there are some inputs or specifications--otherwise, the fastest way is to use an operating system call to copy an existing 2 GB shapefile :-). – whuber Aug 14 '13 at 15:23
  • 1
    Haha! Good one. Well, this question was intended to be open. There is one specific use case right now (~700.000 point geomtries with two attribute columns (strings)) but I was interested in general thoughts about fast ways to create a shapefile from scratch. – LarsVegas Aug 14 '13 at 15:56
  • 1
    @SS_Rebelious, in many cases I agree, since they have several limitations. However, in simpler circumstances shapefiles are still useful, e.g supported by numerous systems, quick & easy to handle etc. – Oyvind Aug 14 '13 at 17:31
  • For what it is worth, here are some short examples to create Shapefiles with GDAL/OGR or Fiona. – Mike T Sep 12 '13 at 23:45

3 Answers3

15

I want to take the chance of promoting OGR's virtual file system that writes geometries to a in-memory dataset.

Using it is simple as @Luke demonstrated in this post

drv = ogr.GetDriverByName( 'ESRI Shapefile' )
ds = drv.CreateDataSource(r'/vsimem/virtual.shp')

This works just great. Creating a point shape file with ~300.000 geometries and two attribute fields takes less than two seconds. Impressive.

LarsVegas
  • 2,536
  • 3
  • 30
  • 47
3

A slow method is updating the shapefile, on disk, for each record. I would look for a Python library that allows you to create an in-memory layer, then write the shapefile to disk in a single function or transactions. Fione/shapely may offer something. Also, ogr2ogr will create a shapefile from csv, or another shapefile, etc. Ogr2ogr is written in C or C++ so it should be fast, but is not a pure Python solution. You'll want to define the shapefile attributes as tight as possible to keep the shapefile as small as possible.

klewis
  • 7,475
  • 17
  • 19
3

If Python is desired for this task, then the GDAL/OGR Python bindings is probably the best free option.

See HERE for a python example showing how to create/write to a shapefile.

Oyvind
  • 1,061
  • 7
  • 14