3

I have a very simple MapInfo workflow for geocoding a CSV file which contains UK postcodes, based on an SQL select query referring to point data for postcodes.

This is simple. Using Mapinfo open the CSV, open the postcode layer, run a single select query which combines according to the matching column and creates a temporary table (combining CSV data with the Postcode data AND geometry), save the table.

I normally work in QGIS. I'd like to be able to omit Mapinfo from this workflow, but I cannot find any sensible way to achieve this, and certainly nothing so simple (despite already using table joins, relationships, filters, and all sorts of other techniques elsewhere). I'm sure that there's a process which involves a Spatialite database, but I'm out of my depth at this point, and it's also going to be a process with multiple (relatively involved) steps too.

Am I missing something?

Rostranimin
  • 2,980
  • 2
  • 26
  • 38
  • Where is your postcode-coordinate information stored? In a shapefile or similar with postcode as an attribute? Is that what you mean by "the postcode layer"? – Spacedman Apr 17 '15 at 11:54
  • The question is broader than this one specific situation - but this situation is a good example of a common issue. I have CSV data with a postcode in it (without any geometry information - it could be another dataset with another key field - this isn't an issue unique to postcodes). I have, as do many other people, the UK postcode point data - a points layer with one point for each postcode (as you'd expect each point has the relevant postcode in one data field). I'm geocoding the CSV data, matching the postcode from the CSV and the points layer using the geometry(objects) in the points layer. – Rostranimin Apr 17 '15 at 14:30
  • Righto, I think we need some sample dummy data sets for playing with... (oh, and its easy to do in R). – Spacedman Apr 17 '15 at 14:56
  • Although it would be a shame if QGIS can't do this directly I personally, at least, would be happy enough with another open source tool. The key should be that the process is as simple as it would be on MapInfo ) – Rostranimin Apr 17 '15 at 15:14
  • I'd be happy enough if I end up with another open source tool that I use alongside QGIS. It would be a shame if it's necessary always to use this, but far from a disaster. Can I do it using the SQL available with ogr2ogr for example? – Rostranimin Apr 17 '15 at 15:15
  • What do you want to do if one of your data points has a postcode that isn't in the postcode-coord lookup? – Spacedman Apr 17 '15 at 15:16
  • Something (anything) meaningful with non-matched postcodes. A data table of unmatched data. Even just a single error message and then skipped data rows would work (I can always check what rows were missed afterwards). Preferably not a silent failure. – Rostranimin Apr 17 '15 at 15:19
  • This sounds less like geocoding and more like a straight one-to-many join. This question is fairly old but has some methods of doing so in QGIS. There's also this question in which the ultimate answer was to load both tables into PostGIS (a true relational database) and do the join there. Some good info in that thread if not a direct solution. – Chris W Apr 17 '15 at 19:25
  • Thanks Chris. The method where the data with geometries is converted to text sounds interesting... but not reasonable for this purpose given the large postcode dataset. I'm sure there are also a whole range of database oriented solutions, but I'm looking for something simple and quick to replace what in MapInfo is a really simple (if you know how) and quick method. Spacedman's MMQGIS based solution works well. I'd also like now to see if ogr2ogr's SQL capabilities can do this but perhaps that's for a separate question. – Rostranimin Apr 18 '15 at 08:52
  • As a minor additional note, I've tested a join using ogr2ogr sql and it works, but as with a join created directly in QGIS I end up without any geometry on the data. No doubt my lack of knowledge is showing up here - so this comment really for others in my shoes. – Rostranimin Apr 18 '15 at 20:43
  • The no geometry issue is in the second linked question - particularly comments on John's answer, where basically we talk about a join not being able to create or transfer geometry. A query can do that. I find much confusion in QGIS's join tools, because they almost never address the relationship (1:1, 1:M, M:1, or M:M). Some of those can only be handled with a relational database, but I'm somewhat used to seeing options in the ArcGIS tools (like it lets you choose 1:1 or 1:M and the result produces duplicates if needed). In QGIS I don't see this and they don't tell you which it will do. – Chris W Apr 18 '15 at 23:13
  • Thanks Chris. At some point I need to get more familiar with how some of the databases work (I'm far from being a stranger to databases per say - I've set up several simple ones in the past for other reasons). The learning curve is always pretty steep though. Maybe what I haven't realised is that this is one thing that MapInfo does really well and simply. It's certainly going to continue to be my main tool for this purpose (really only this purpose) for the moment.. – Rostranimin Apr 21 '15 at 20:48

2 Answers2

2

Since I asked this question QGIS has developed the capacity to do this more simply and easily... (since 2.14 I think)... so I'm returning to answer my own question.

Use a QGIS 'Virtual Layer'. Menu: Layer|Add Layer|Add Virtual Layer (or corresponding button)

Image of dialogue box for virtual layer

In the image is the add virtual layer dialogue window that appears - in this case I'm working with a layer with points 'TestPCode' and a csv layer 'TestSchools' which has school data with matching postcodes but no geometries.

  1. Give your layer a name
  2. Use the import button to add the two files (to the 'Embedded layers' box).
  3. Write the appropriate SQL (see further notes below)
  4. test if you like
  5. 'OK'

I'm (obviously) no expert on SQL - but things aren't quite as simple as they might be. The statement in the image does odd things where there's a null value for the field to be matched in the CSV file ("Postcode"). It matches a null value in the csv file to each and every object in the other table. To get the join to only create objects where there's a proper full match I've found that the following works...

SELECT "TestSchools".*,"TestPCode".* FROM "TestSchools" inner join "TestPCode" ON "TestPCode"."TestPCode" = "TestSchools"."Postcode" and "TestSchools"."Postcode" is not null

(Advice on getting the SELECT statement right would be appreciated - as would any further comment - I've been feeling my way on this)

Rostranimin
  • 2,980
  • 2
  • 26
  • 38
  • Thanks for this solution. I think its the closest yet to what can be done in Mapinfo within a few seconds. Works great with smaller layers, but with a full UK postcode dataset this process is very slow. I wonder how MI does it so quick!? – BStone Oct 19 '17 at 15:50
  • Now that I've worked out how to work properly with Spatialite databases I'd try using this in future (for speed). At the moment the workflow for this is much more complex than the simple one on MI - but once you know how to do this it's quick and simple. What I don't know at the moment is whether this makes the query faster, but I'd expect it to. The query would be run within the Spatialite database rather than by QGIS itself. – Rostranimin Oct 23 '17 at 09:42
1

Use the MMQGIS plugin's "Attributes Join from CSV File" method.

I have a shapefile called postcodes with just the postcode attribute, and a CSV file called people.csv with a name and a postcode column. I load the postcode shapefile into QGIS (but not the CSV).

Then MMQGIS: Combine: Attributes Join From CSV File and fill the dialog thus:

enter image description here

That gives me a new shapefile with the attributes from the CSV and the locations looked up by postcode. The "CSV File Field" is matched to the "Join Layer Attribute". Non-matching lines in the CSV get written to notfound.csv

MMQGIS is available from the plugin loader.

Spacedman
  • 63,755
  • 5
  • 81
  • 115
  • Does this deal with the one-to-many issue? Isn't this another way to create a join bringing the CSV data onto the postcode data (so won't work with two people one postcode). Will try. – Rostranimin Apr 17 '15 at 15:28
  • No, you get one point for every line in your CSV, (minus the ones that don't match). – Spacedman Apr 17 '15 at 15:30
  • Interesting - I remember now I tried this before - and again now - but I get an error "Attribute Join / No geometry in layer: Postcodes" (I know the data is valid from doing this in Mapinfo). Perhaps some specific issue with the plugin? I'm also trying to work out an ogr2ogr solution but struggling (see http://gis.stackexchange.com/questions/95746/join-a-csv-file-to-shapefile-using-gdal-ogr ) – Rostranimin Apr 17 '15 at 15:44
  • Is your postcodes layer a point shapefile? Or is it a CSV you've loaded without geometry? Can you map it? Something's not right. – Spacedman Apr 17 '15 at 15:50
  • This should work - I'm not doing anything silly (?!?I believe). Postcodes.TAB (tried converting to .SHP too) is proper point data. Gardens.csv (tried comma delimited, tab delimited, with titles, without titles). Did it in 2 minutes on Mapinfo! – Rostranimin Apr 17 '15 at 16:12
  • So I think I've nailed the method, it something in your data or your setup. Can you make the data available? – Spacedman Apr 17 '15 at 20:27
  • OK - I agree - your answer is a good one. Tested at home (i.e. different machine) with temporary very simple data constructed by hand as a SHP and CSV file. It works well. I'll do some checks to see what the failure is at work. Numerous possibilities (larger postcode dataset, postcodes in TAB format, some fault in the data, data on a shared server, etc). I'd still like to see the power of the MapInfo SQL SELECT ability closer at hand in QGIS, but this is good and simple (when it works) for less experienced users at least. Thank you! (Postcode data available from Ordnance Survey website.) – Rostranimin Apr 18 '15 at 08:37
  • I know this is old, but I also get the same 'no geometry' error in very similar circumstances (large postcode layer in TAB format with 2.6mil rows) perhaps it is a size issue. Also baffled that something so easy to do in Mapinfo (even the really ancient versions have SQL select) is so difficult to achieve in QGIS. I find it hard to believe that the problem we have (adding geom to a CSV using a large spatial layer as a lookup) isn't a really common one encountered by QGIS users. – BStone Oct 19 '17 at 13:58