70

Recently I've been spending a lot of time converting perfectly good field names like "Percent of citizens age 25 and over with a bachelor's degree or higher" into things like "edbchogtr" to meet the DBF's 10 character field name limit.

In another thread (“Oddities” in the Shapefile technical specification), geospatialpython commented that "Despite the shapefile format's flaws, oddities, and limitations it persists stubbornly in and around the field of GIS. Every other attempt to replace it has been too bloated for simple vector storage or too proprietary."

This activity coupled with Mr. Lawhead's comment has me wondering:

  • have any explicit attempts ever been made to replace the shapefile as GIS's ubiquitous data storage and interchange format?
  • Are there any contenders?
  • If there have been competing formats, why have they failed?
  • Has Esri refused to support them, or is the story simply one of technological inertia?
  • If there haven't been attempts... why not?

It seems like we could do a little better for ourselves, both as GIS developers and users.

canisrufus
  • 2,474
  • 1
  • 18
  • 30
  • Geodatabase? but then the shapefile never had true topology. – Mapperz Feb 20 '12 at 20:52
  • 2
    @Mapperz Other than the recently released Geodatabase API, I don't see any tools for writing a geodatabase that are free. I don't think this could count as a replacement except in the ESRI portion of the world. – canisrufus Feb 20 '12 at 21:23
  • 2
    You can write and read geodatabases (via API) using GDAL http://www.gdal.org/ogr/drv_filegdb.html using http://resources.arcgis.com/content/geodatabases/10.0/file-gdb-api – Mapperz Feb 20 '12 at 22:12
  • Oops, I searched this doc table for "geodatabase" and read that it won't write personal geodatabases. I missed that it writes FileGDBs. – canisrufus Feb 20 '12 at 22:17
  • Shapefiles and Personal Geodatabases (an MS Access table) are limited to 2GB. That is not very much data in today's terms... so would recommend File Geodatabases – Mapperz Feb 20 '12 at 22:20
  • Ah, an important difference. – canisrufus Feb 21 '12 at 01:23
  • @canisrufus technically, you can also use the ArcObjects driver http://www.gdal.org/ogr/drv_ao.html that writes to any ESRI supported format (as long as you have an ESRI license). – Ragi Yaser Burhum Feb 21 '12 at 01:52
  • 1
    Would like to see Python API to read/write File Geodatabase (at least Simple Features) without ArcGIS license - that would be Open. – PolyGeo Feb 21 '12 at 04:16
  • 2
    @PolyGeo you and everyone else :) – Ragi Yaser Burhum Feb 21 '12 at 18:21
  • @Mapperz Are you certain about the 2GB file limit for shapefiles? I generated some shapefiles that were 3GB recently and was able to open them in ArcMap. – djq Feb 21 '12 at 18:46
  • 3
    @celenius From http://www.gdal.org/ogr/drv_shapefile.html "Geometry: The Shapefile format explicitly uses 32bit offsets and so cannot go over 8GB (it actually uses 32bit offsets to 16bit words). Hence, it is is not recommended to use a file size over 4GB. Attributes: The dbf format does not have any offsets in it, so it can be arbitrarily large."

    So you can have dbfs that are pretty big, but you have to be careful with your shp going over 4GB. Then you are playing with fire.

    – Ragi Yaser Burhum Feb 21 '12 at 19:08
  • @RagiYaserBurhum I see. 4GBs bad, 2GBs good. Interestingly the shapefile size limited the file size I could loading into PostGIS (on Win 32bit). Completely separate problem, but it made me pay attention to how big my file was. – djq Feb 21 '12 at 20:03
  • 1
    If DBF or SHP goes beyond 2 GB you are likely to run into problems on many systems. Beyond 4 GB won't work with SHP at all and I think DBF would have the same problem. In theory it should work, but in reality most software is using signed 32-bit integers for opening them. – Uffe Kousgaard Feb 21 '12 at 21:59

7 Answers7

52

This is a topic that always comes up. I may not have the right answer, but I can give you my personal opinion.

The reason that they are supported, can be attributed to several characteristics about them, so let me mention a few.

  • First, there is a spec. I mean, I am in my early thirties and this thing existed since I was a teenager. So it is safe to say that this spec has been around for some time. Of course, there are several other formats that are also published, but the difference about this one is that...

  • It is relatively simple! It is built on top of the DBF Format, which at the time already existed and was widely supported in several platforms/OSs. There were already parsers that could read half of this format (the DBF part), so it made supporting the extra addition easier. You have a geometry? Sure just serialize it and write it. You are done. Contrast this with a coverage! Try to explain to somebody in simple terms what a topology clean does. It is not trivial to write a topologically clean coverage.

  • Most importantly, I think the #1 reason for shapefiles to still be popular is that they are supported in both Open Source and Proprietary systems alike. What GIS do you know that doesn't support shapefiles?!? Unheard of.

As a replacement, we hear of File GeoDatabases and Spatialite. Both formats, are vastly superior in terms of functionality, flexibility, speed, etc. when compared to Shapefiles. In their own way, they have certain things that make them better than each other in different areas, but a comparison of spatialite and FileGDB is certainly out of the scope of this question.

Do I think that either of this formats will replace Shapefiles? Not in their current incarnations.

Why?

Not because of a technological argument (I did say they were superior in that aspect after all), but because of something else: licensing.

So what are their problems?

FileGDB:

FileGDB provides interoperability through the new FileGDB API. Nevertheless, this API is provided in binary format by ESRI. This is not a specification. Having worked in the GeoDatabase team in the past, I can tell you, contrary to all the tin-foil-hat-wearing conspiracy theorists, this is not malicious at all. It is because the internals of the GeoDatabase change on every release. Publishing a full spec would entail basically giving all the details of how everything is supposed to be maintained and then carefully documenting the changes to the format with every yearly release. It doesn't make sense. So the FileGDB API, even though it is not a spec, it abstracts out all those little changes. And now it can be used cross-platform! Mind you, this is a huge step forward! Considering the conservative nature of ESRI, this is definitely a reaction in the right direction.

And yet, binary-only support doesn't make anybody in the Open Source world too happy. How do you then take advantage of porting some code to say to some other flavor of Linux if ESRI doesn't support it. You can't. This is what makes Open Source powerful, and now, you cannot take advantage of this. If ESRI decides to stop supporting Debian, that's it. You are done. And there is nothing you can do to change it.

Spatialite:

Spatialite is awesome because it gets all the free functionality from SQLite. SQLite is used everywhere. It is on your Android Phone, on your iPhone/iPad, on Firefox, on Google Chrome, on several commercial embedded devices - can go on forever. To truly make it into a Geoformat (and not just do dumb bounding box operations), it needs to leverage the same geometry library that PostGIS uses: GEOS. Sadly, GEOS is based on another even more awesome geometry library known as JTS. All the algorithms in JTS are extremely powerful, so what is the problem?

Well, JTS is licensed as Open Source LGPL, and LGPL is a viral license. JTS is LGPL, means GEOS is LGPL, means spatialite linked statically with GEOS is LGPL. This sucks. Why? Without explaining open source licenses too much, I can tell you that, for example, I cannot use spatialite on, say, an iPhone app because that would make my entire app automatically open source (iOS only allows static linking). Any type of GPL license (reasonably) scares the crap out of ESRI, and so they will not touch it with a 10 foot pole. Hence, ArcGIS, the most popular GIS system in the world does not (and will probably never) support spatialite natively. This automatically kills it as a viable format.

And thus we go back to crappy shapefiles that are supported everywhere.

Update:

Apparently my answer was controversial enough that someone decided it was OK to freely edit and change the entire meaning of my answer to put their point of view. Please don't do that. If you disagree with me, that is completely fine, just post your opinion in a different answer and let the community decide. I rolled backed the edits to my answer to show the original meaning. I am adding this update in case you read the edited answer that claimed that sqlite was a viable format.

Ragi Yaser Burhum
  • 15,339
  • 2
  • 59
  • 76
  • The problem with SQLite/Spatialite is that it's not a format, it's a relational database engine with spatial library on top of it. While it does what it does very well, it forces the data to be stored in the relational manner, which isn't always the most suitable way. Also, the complexity of SQLite file format (http://www.sqlite.org/fileformat2.html) makes it difficult to access the data without the SQLite engine and is thus not suitable to be an open & easily accessible file format for data exchange. It wasn't really designed for that. – Igor Brejc Feb 21 '12 at 06:31
  • 8
    Actually, LGPL isn't a viral license - it was specifically designed to avoid this. Additionally, Spatialite is licensed under the MPL tri license (source), meaning among other things you can choose the Mozilla Public License as the best fit license and operate under its (very weak copyleft) terms. My reading at least is that ESRI have no reason not to support Spatialite because of the license - whether they will (given it competes in almost the same space as FileGDB) is another story... – om_henners Feb 21 '12 at 13:49
  • @IgorBrejc I do realize spatialite is more than a format. You can argue the same about FileGDB though. FileGDB (like any other GeoDatabase) brings object relational behaviors with several GIS specific concepts: think geometric networks, topologies, representation classes, tins, cadastral datasets, etc. In that regard, it is even further out from shapefiles than spatialite, but yet it is still brought up (correctly) as an option on this discussion. Hence, why I bring spatialite. – Ragi Yaser Burhum Feb 21 '12 at 16:14
  • @om_henners I was really hesitant of bringing the viral nature of LGPL to this context because I realize some people feel extremely offended about calling the GPL viral. I dont want to turn this thread into a licensing issue. The argument remains though. ESRI will not support spatialite natively because it is LGPL and other platforms that only allow static linking, cant leverage it without LGPLing themselves. – Ragi Yaser Burhum Feb 21 '12 at 16:20
  • @om_henners Also, derivative works, include looking at source code and creating a port. Since the original code is in Java and GEOS is in C++, I can tell you this is why GEOS is LGPL. No way around it. That is viral, no? – Ragi Yaser Burhum Feb 21 '12 at 16:24
  • @Ragi as long as you dynamically link LGPL software, there's no virality – Igor Brejc Feb 21 '12 at 16:37
  • @IgorBrejc If you read my reply, you will find an example of a case where no linkage whatsoever is required and you still get virality. The port from Java's JTS to GEOS in C++ has no linkage whatsoever and you still have to license it as LGPL. – Ragi Yaser Burhum Feb 21 '12 at 17:30
  • @IgorBrejc Even worse, as I stated in my answer, certain platforms only allow you to statically link (iOS) as a 3rd party developer. This makes leveraging LGPL code in iPhone/iPads worse than inconvenient. – Ragi Yaser Burhum Feb 21 '12 at 17:33
  • 4
    @Ragi, you mix using a library and porting it. Of course porting will have to be LGPL, since this is in essence a derivative work. But if you link it dynamically, it is not considered a derivative work, it's "work that uses the library" and you get to keep your license (http://en.wikipedia.org/wiki/GNU_Lesser_General_Public_License). So saying "LGPL is viral" without additional explanation is not accurate. – Igor Brejc Feb 21 '12 at 20:34
  • 2
    But again, this is a moot point, since Spatialite is licensed under a tree-licensed schema (https://groups.google.com/forum/?fromgroups#!topic/spatialite-users/Rkm97hShD1Y), so you get to choose the license that suits you most - MPL allows static linking. – Igor Brejc Feb 21 '12 at 20:34
  • @Igorbrejc spatialite, without geos is not even half as useful. A choice could have been made to use the boost geometry library instead of geos and you could have had a far more permissive license for platforms like iOS. But that is a separate discussion. My answer remains, ESRI will not touch anything *GPL on it. If you want to discuss the licensing issues that people see about spatialite, we should start a new question. It would be more appropriate. – Ragi Yaser Burhum Feb 22 '12 at 00:19
  • @canisrufus thanks man! I was starting to think this thread was turning into a licensing one reminiscent of the thousands of BSD vs GPL that are out there and I was beginning to regret answering – Ragi Yaser Burhum Feb 22 '12 at 00:21
  • @igorbrejc I guess one clarification that I think I owe you is that I think that spatialite without geos or proj is pretty useless. Spatialite withour projections, intersection, buffers, correct results for any of the clementini operators, etc is not using it. At that point, you might as well use plain sqlite without spatialite. For that setup, say, in my iPhone, there is no tri-license option. It is pure LGPL. That's my personal opinion.If you asked ESRI to support it, the cost-benefit analysis takes it out of the equation.Benefit:support OS format that can replace my GDBS.Risk:LGPLing ArcMap – Ragi Yaser Burhum Feb 22 '12 at 16:21
  • I know this is really delayed off the original, but as an update in case anyone else finds this - ESRI quietly (very quietly) added Spatialite support to ArcGIS 10.2. So far, using it is pretty seamless, but the databases can only be created in ArcPy (but used everywhere). Converting tables with large text fields also seems to be a problem, but it's otherwise workable – nicksan Jun 07 '14 at 20:30
  • According to this answer, you can statically link an LGPL app if you also provide, on request, the means to build the application to anyone that you've distributed your binaries. That means all source of LGPL libraries you've used, and the objects (but not source) of your proprietary code. You don't have to distribute the source with every binary, and you don't have to make your binary objects public. Just available to interested customers. – dericke Jul 01 '15 at 19:29
  • Hey @RagiYaserBurhum, "ESRI will not support spatialite natively because it is LGPL and other platforms that only allow static linking, cant leverage it without LGPLing themselves." -> They actually did so many years ago, http://resources.arcgis.com/en/help/main/10.2/index.html#//019v0000001w000000 Time to ease up on your angry ranting and stop spreading FUD? – bugmenot123 May 19 '17 at 14:28
  • @bugmenot123 so you are talking about my "ranting" from 2012 when this was true? Give me a break buddy. And just so you get the argument right, I was talking about iOS and LGPL - this still holds true and the support you talk about is for Desktop that allows dynamic linking. If you are going to criticize, at least know what you are talking about. – Ragi Yaser Burhum May 31 '17 at 23:46
  • Stack Exchange is not a volatile forum, its answers are supposed to stand the test of time. Your answer was borderline back then and it is not useful today. By the way, apparently since 3 years ago you can use dynamic linking on iOS: https://stackoverflow.com/a/4733885/4828720 – bugmenot123 Jun 01 '17 at 07:57
  • 2
    @bugmenot123 Fine, then correct it if you wish, but dont accuse me of spreading FUD about OS because it is insulting. I have been writing OS code for over a decade (would not be surprised that you have used some of mine actually) and that was not an angry rant. It was true - and it still is. Dynamic linking in iOS of LGPL (well, to be precise, frameworks, were allowed in iOS 8). This has never been a technical issue, but a legal one. Distribution in the Appstore requires code signing - and sadly for all OS lovers like me - LGPL is a fuzzy license for this. No precedent in court. – Ragi Yaser Burhum Jun 01 '17 at 15:27
  • so everyone interprets it differently. If you want understand more, you can read the wiki of one of the most popular dual license open source frameworks in the world (QT) https://wiki.qt.io/Licensing-talk-about-mobile-platforms This is a complex issue that does simply not exist with Apache, MIT, BSD and other OS licenses. Heck, even arguably the most popular geometry library in the world, JTS, is going through a full relicensing effort right now through the Eclipse LocationTech group to, among other things, avoid that legal mess for usage and derived work. – Ragi Yaser Burhum Jun 01 '17 at 15:33
  • @bugmenot123 by the way, several of the answers you give in this forum are from open source projects I contribute to code heavily. You are using my OS code... you are welcomed :) – Ragi Yaser Burhum Jun 01 '17 at 15:39
18

The SHP+SHX part itself isn't so bad. The real problem lies in the DBF part. That could do with a new format, which supported unicode and all sorts of modern field types. The problem is getting it well supported by all the software out there.

Uffe Kousgaard
  • 2,543
  • 15
  • 24
  • 6
    +1 Improving on the DBF part isn't at all difficult, either: it really does come down to persuading software developers to agree on something. – whuber Feb 20 '12 at 21:57
  • 1
    Has there been an attempt? – canisrufus Feb 21 '12 at 01:28
  • 6
    I've often pined for a Shapefile amendment that simply substituted a UTF-8 CSV file for the DBF. It would be simple to support and require minimum changes to existing software packages. – scw Feb 21 '12 at 19:03
  • CSV do not allow for random access, so it is a no-go. – Uffe Kousgaard Feb 21 '12 at 21:56
  • 1
    @canis Fox Software made a minor (proprietary) attempt in the late '80s. After MS purchased them (c. 1990), that was that. The community created a DBF 3 standard and that pretty much froze all development. MS released Access; FoxPro died out; the world moved on. – whuber Feb 22 '12 at 17:38
  • 1
    On the contrary, @Uffe, CSV files can be randomly accessed: you only need an index, just like DBF files do for efficient searches. The biggest problem I see is that seemingly minor changes that happen naturally to CSV files, like quoting strings or CR/LF conversions, will screw up all the byte offsets. The fixed length record structure of a DBF file, although less efficient in storage, does not have that problem. – whuber Feb 22 '12 at 17:47
  • This is a joke, right? You can't seriously suggest an index on CSV files. – Uffe Kousgaard Feb 22 '12 at 21:28
11

GeoPackage is a promising successor. It's similar to Spatialite but from OGC and it's been adopted by many software, inlcuding ArcGIS and OGR.

See the official homepage http://www.geopackage.org/ and e.g. this presentation: http://www.slideshare.net/JeffYutzler/geopackage-swg-overview

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Stefan
  • 811
  • 7
  • 12
7

At least spatialite has the intention, see eg this presentation http://www.sourcepole.ch/assets/2010/9/10/foss4g2010_spatialite.pdf

On the other hand, I do believe that the main reason that it failed is that shp is well supported by many applications and only has minor deficiencies.

Others share this opinion as well:

This isn’t because the SpatiaLite project hasn’t given tools to us to implement, it has been the community could care less about it. SHP works for them and there isn’t any reason to change.

http://www.spatiallyadjusted.com/2010/09/16/spatialite-is-not-the-shapefile-of-the-future/

More thoughts on Esri file geodatabase, spatialite and autodesk sdf here: http://www.spatialdbadvisor.com/blog/121/the-shapefile-manifesto

johanvdw
  • 6,207
  • 27
  • 42
  • As great as I think spatiaLite is, it's ~3 megabytes of overhead in functions, reference systems, etc. that keep it from being a good all-around exchange format. – Scro Feb 20 '12 at 21:51
  • Actually, the license for spatialite is less than ideal - it has nothing to do with the tools. – Ragi Yaser Burhum Feb 21 '12 at 01:53
  • @Scro, 3 megabyte is too big? It certainly isn't too big for the desktop. You must be considering mobile devices. Also, is there another Spatial API, with equivalent functionality, in a smaller size than Spatialite? – klewis Feb 21 '12 at 14:14
  • @klewis -- it's not too big per se, it's just very inefficient when you consider there are a lot of small (think < 200kb) datasets out there. That's a lot of overhead, especially in light of the fact that, once received, you would typically leave each dataset in it's 3mb file, or roll it into an existing database. Just to be clear, I <3 spatiaLite -- but we're talking about data transmission, where some sort of flat file/xml/wkb would be much more efficient. – Scro Feb 21 '12 at 14:58
6

Esri's been promoting File Geodatabases for several years now as a replacement for shapefiles.

More recently they've provided an API that hides any oddities.

Kirk Kuykendall
  • 25,787
  • 8
  • 65
  • 153
  • I haven't worked with geodatabases a great deal. Wikipedia says they're a "closed" standard, e.g. the geodatabase spec hasn't been published. It seems hard to get very wide adoption without publishing the internals of the format. While I'm too young to know the history, it's my guess that shapefiles are in part so popular because of the public portion of the specification. The API does seem like a good step. – canisrufus Feb 20 '12 at 21:10
  • 2
    @canis you are correct. At the time nobody would have adopted shapefiles except that ESRI specifically promoted them as an open GIS data exchange format. Even with the limited software tools available at the time, with ESRI's release of a clear .shp/.shx specification (and a commitment to stick to it), it became a matter of just a few hours' work to write code to read and write shapefiles: no reverse-engineering necessary. – whuber Feb 22 '12 at 17:41
  • As long as the API is a black box binary blob, FGDB won't see the same adoption as SHP. Even if Esri convinces all their customers to switch to FGDB from SHP, the API isn't really compatible with open source. – dericke Jul 01 '15 at 19:28
3

An XML dialect, like GML, is definitely not optimized to operate huge datasets, but, can be used as an exchange format between software or between platforms.

I don't believe there is any problem with the licensing (see Ragi Yaser Burhum's post about the viral characteristics of Spatialite) and it is fairly easy to adapt existing parsers if required.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Stéphane Henriod
  • 1,263
  • 3
  • 15
  • 32
  • 1
    I think it hasn't been mentioned for just the reason you're bringing up, that it's not optimized for large datasets. XML is bloated. The formats mentioned here are binary, where GML stores points as strings. The size can be over an order of magnitude different. – canisrufus Feb 21 '12 at 18:05
  • 3
    Canisrufus is right. There are several problems with GML. The Infoset can be be navigated using XPath, but anybody that has tried to implement spatial indexing on top of XML will tell you how irrational this is and how badly it maps to traditional relational databases. Without going into many details, if something as basic as indexing and querying become not trivial, the format is bloated, and it basically requires you to have the entire dataset in memory to do anything with it, then this is not a good option. – Ragi Yaser Burhum Feb 21 '12 at 18:56
  • 4
    xml is bloated when stored as plain text. There are freely (both free of charge and free to modify and redistribute) available binary xml libraries that can serve as drop in replacements for xml readers, giving people the freedom to utilize both the human readability of xml and the performance and storage efficiency of binary. The only reason I can think of for it never being taken up in a large way is as johandvw observes above: no one cares, .shp has been "good enough" as is. – matt wilkie Feb 21 '12 at 20:46
1

Just to come at this from a different perspective, I'm not sure the use of "Percent of citizens age 25 and over with a bachelor's degree or higher" is a perfectly good field name. While mixing spaces and apostrophes can be handled, if you are writing code or queries it is more likely to introduce bugs.

In my opinion the future of spatial data distribution should focus on the web and web services, and the WFS specification (which uses GML) is open and established. GeoJSON is smaller, and can be easier to work with in JavaScript. However with compression the sizes are comparable.

I'd also like to throw in a vote for ESRI's Personal Geodatabases. It may be an oft maligned Microsoft format, but it supports ODBC, SQL queries, views, and allows non-developers to create easy data entry forms, and include at least some level of data integrity checks (data types, lengths, unique values).

geographika
  • 14,320
  • 4
  • 53
  • 77
  • That's a valid point. What's good about them is that given knowledge of the English language, one can figure out what the fields mean. – canisrufus Feb 23 '12 at 14:17
  • That's really the role of the datasets metadata though. The shapefile can use an XML file with the same name to store this. – geographika Feb 23 '12 at 18:00