I currently have a large and growing database of geo-social media. I'm using Postgres with postgis, and so far it's been quite good.
My use case thus far has been to query the database for all posts within specific areas, such as every tweet within each post/zip-code in the US/Canada (or other boundaries -- the posts are not geo-coded to the post/zip code. selection is through intersection of coordinates).
In the past this was relatively quick and easy to do, however, now that my data table has grown to over 350m posts, my queries take forever to complete. Each post code can take up to a day to retrieve the tweets from within its geometry.
I am currently debating what to do next. I know of a few solutions, but I'm curious what you think of them and if I'm missing anything.
throw more memory at postgres, and reduce disk searching. I'm currently working on doing this, but I wonder if it is a good long term solution.
find a better way to partition data into more tables. Currently, all of my posts are in the same table, with each column indexed. Given that I need to search through all of the posts to find the ones in the given areas, is this the wrong way to be doing it?
create a hybrid of Cassandra-PostGIS. Bulk store my data into Cassandra, and then pull out roughly the data I need (say, one state/province at a time) to a staging data table in PostGIS, then query individual post/zip codes using PostGIS. Is this overkill/horribly inefficient? I have looked into Postgres-XL, but it has been kiboshed by others in this project for cost issues.