Looking at the increasing NoSQL movement and considering that databases like MongoDB offers a new perspective in flexible data storage for GIS. What is the best way to store lines and polygons in JSON documents to take advantage of 2d indexes and spatial functions?
-
6MongoDB doesn't currently support indexing on anything other than points, and its spatial functions are limited to finding within bounds. – scw Apr 26 '11 at 20:10
3 Answers
Here's an example of a line and a polygon:
{ "type": "FeatureCollection",
"features": [
{ "type": "Feature",
"geometry": {"type": "Point", "coordinates": [102.0, 0.5]},
"properties": {"prop0": "value0"}
},
{ "type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
]
},
"properties": {
"prop0": "value0",
"prop1": 0.0
}
},
{ "type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
[100.0, 1.0], [100.0, 0.0] ]
]
},
"properties": {
"prop0": "value0",
"prop1": {"this": "that"}
}
}
]
}
- 13,313
- 6
- 55
- 96
This is simply not true,
"to take advantage of spatial indexes in Mongo, you'd need a spatially indexed collection holding nothing but a record for each of the polygon's points, with an additional value for the record ID of your spatial record living in another collection, then use a bounding box query to get record IDs from one [collection] and select [record data] from the other [collection], effectively emulating a join."
I have USGS point data stored in a single Mongo collection with records that look like this:
> db.names.find({FEATURE_NAME: 'Mount Saint Helens', STATE_ALPHA: 'WA'})
{ "_id" : ObjectId("4e262106d7a99b7db41a4919"),
"_ID" : 1525360,
"FEATURE_NAME" : "Mount Saint Helens",
"FEATURE_CLASS" : "Summit",
"STATE_ALPHA" : "WA",
"STATE_FIPS" : 53,
"COUNTY_NAME" : "Skamania",
"COUNTY_FIPS" : "059",
"COORDS" : [ -122.1944, 46.1912 ],
"ELEV_IN_FT" : "8356" }
I am able to do bounding box queries on this data that return the entire record (without the need for another collection) just fine.
Query:
> box = [[-126.562500,45.089036], [-123.750000,47.040182]]
[ [ -126.5625, 45.089036 ], [ -123.75, 47.040182 ] ]
> db.names.find({"COORDS" : {"$within" : {"$box" : box}}, FEATURE_CLASS: "Summit"}, {FEATURE_NAME: true, COUNTY_NAME: true, STATE_ALPHA: true, ELEV_IN_FEET: true}).limit(5);
Response:
{ "_id" : ObjectId("4e2620f8d7a99b7db4146cec"), "FEATURE_NAME" : "Harlocker Hill", "STATE_ALPHA" : "OR", "COUNTY_NAME" : "Coos" }
{ "_id" : ObjectId("4e2620f8d7a99b7db414a349"), "FEATURE_NAME" : "Neskowin Crest", "STATE_ALPHA" : "OR", "COUNTY_NAME" : "Tillamook" }
{ "_id" : ObjectId("4e2620f8d7a99b7db414a105"), "FEATURE_NAME" : "Miles Mountain", "STATE_ALPHA" : "OR", "COUNTY_NAME" : "Tillamook" }
{ "_id" : ObjectId("4e2620f8d7a99b7db414934a"), "FEATURE_NAME" : "Mount Gauldy", "STATE_ALPHA" : "OR", "COUNTY_NAME" : "Tillamook" }
{ "_id" : ObjectId("4e2620f8d7a99b7db4149d06"), "FEATURE_NAME" : "Little Hebo", "STATE_ALPHA" : "OR", "COUNTY_NAME" : "Yamhill" }
Mongo also provides the ability to do nearest neighbor searches, as well as point in polygon searches. This is well documented at mongodb.org
- 981
- 6
- 16
-
Apologies but I'm confused, MongoDB can or cannot create a spatial index on line and polygon feature collections? – Derek Swingley Jul 28 '11 at 04:24
-
2It cannot create a spatial index on line and polygon features at this time. However, it can do a point-in-polygon search on a table with points in it, if you provide the polygon geometry as part of the query. http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-BoundsQueries – lagerratrobe Jul 28 '11 at 16:20
-
2OK, so the statement: "GeoJSON is a fantastic format but to take advantage of the limited (POINT-ONLY) spatial indexes in Mongo" is actually true because Mongo can only spatially index points. – Derek Swingley Jul 28 '11 at 16:45
-
I grant you that a portion of that sentence is accurate, "limited (POINT-ONLY) spatial indexes". So 5 out of 71 words, or 7%. That leaves 93% of it being incorrect. I stand behind my statement. – lagerratrobe Jul 28 '11 at 17:27
-
1Can you edit your answer to clarify? As is, it's confusing and misleading. Regarding the other portion of the statement, isn't that basically just a suggestion for implementing a spatial index for non-point data? It might not be ideal or optimal, but it's just a suggestion. Elaborating on why you think the majority of that statement is wrong would help too. – Derek Swingley Jul 28 '11 at 17:36
-
I've updated my original response to try to clarify things. My disagreement stems from the poorly worded phrase which states that 2 collections (tables) are needed in order to take advantage of spatial indexing. At this point it feels like we're debating semantics, and I'm not going to spend any more time on this. Read the link to the mongo docs that I included in my answer. It should provide all the clarity that is needed. – lagerratrobe Jul 28 '11 at 18:03
-
I -think- Jason Scheirer's answer specifically applies to using Mongo's spatial indexes for polygon features. In order to take advantage of spatial indexes while doing a spatial query against polygons, you would have to do the query against the vertices, use the vertices to query against the collection table to find the intersecting polygons, or use the collection table results to query polygon vertex collections from the point table to determine if the entire polygon lies inside the bounding box. There is no way you can handle polygons with spatial indexes without the vertex collections. – blord-castillo Jul 28 '11 at 21:05
-
Not to mention Mongo unloads a lot of the heavy lifting that PostGIS (and every other spatially enabled RDMBS) does with geometries -- the ST_* operations like Clementini operators and yielding buffered/instersected/clipped geometries onto the developer. Mongo satisfies some very basic use cases, but it's still in no way, shape or form as mature and robust for spatial applications as an RDBMS. It is web scale, however. – Jason Scheirer Jul 29 '11 at 17:13
One thing to note is that MongoDB's support for spatial datatypes is horrendously bad for any serious spatial lookup, and this applies across the board with NoSQL last time I checked. I dislike GeoCouch quite a bit less, but it also still has a way to go.
GeoJSON is a fantastic format but to take advantage of the limited (POINT-ONLY) spatial indexes in Mongo you'd need a spatially indexed collection holding nothing but a record for each of the polygon's points with an additional value for the record ID of your spatial record living in another collection, then use a bounding box query to get record IDs from one and select from the other, effectively emulating a join.
You could go hacky and just do the corners of the bounding box as points for your records, but then bounding box searches may fail and all in all it forces some pretty inefficient design patterns and inappropriately pushes all kinds of responsibilities to the developer.
As a reference implementation, you may want to refer to this code which was presented at the Esri Developer Summit this year.
I have not been happy at all with the spatial support on the various NoSQL databases. They only go far enough for dumb point cloud lookup, which makes sense considering most apps using this are only dropping pushpins onto a Google map on a browser somewhere. PostGIS is still going to be the best open-source workhorse for managing spatial information for the foreseeable future.
- 18,002
- 2
- 53
- 72