4

I am trying to merge non-spatial data (data frame in R) and spatial data (SpatialPolygonsDataFrame in R) and eventually create the merged file in SpatialPolygonsDataFrame form.

In specific, I downloaded 'zip areas boundary file' from census web site (https://www.census.gov/geo/maps-data/data/cbf/cbf_zcta.html) and read it as 'SpatialPolygonsDataFrame' file using 'readShapePoly' function in R.

And then I merged the SpatialPolygonsDataFrame with my data in 'data frame' form, but the merged file is 'data frame' form not 'SpatialPolygonsDataFrame'.

Can someone let me know how to create 'SpatialPolygonsDataFrame' after merging spatial data and non-spatial data? I have been spending my entire day just on this but didn't have clues.

The code that I used is as follows:

nation <- readShapePoly("C:/Users/Research/data/nation.shp") 
us_urban_zipcode <- as.data.frame(us_urban_index13[c(39,91)])
us_urban_index_nation <-merge(us_urban_zipcode, nation, by = "ZCTA5CE10", 
all.x=TRUE)

Nation is SpatialPolygonsDataFrame and us_urban_zipcode is dataframe, and merging them results in dataframe, not SpatialPolygonsDataFrame, which I need for further analysis.

My data non-spatial data looks like this:

---------------
zipcode | row
 10003  |  1
 10002  |  2
 10003  |  3
 10004  |  4
 10002  |  5
  ...   | ..
-----------------

And my spatial data looks like this:

----------------------
zipcode | AFFGEOID10
 10001  |  477175
 10002  |  2118827
 10003  |  78588518
 10004  |  9804480
 10005  |  4845242
  ...   |    ..
-----------------------

So, basically my non-spatial data is bigger than spatial data in terms of observations. The zipcode of spatial data is all unique (only one zipcode in each observation), but there are redundant zipcodes for non-spatial data. But I need to keep all observation in non-spatial data for further analysis. This is why I used 'all.x =T' or 'all.y = T' in the merge function.

Emily
  • 149
  • 1
  • 1
  • 3
  • Maybe could be with tidyverse joining data.frame from sp object with the other one – aldo_tapia Oct 26 '17 at 20:42
  • 1
    You need to provide your code. It is virtually impossible to guess at what you have done. This is a very straight forward process so, it is something that you are missing in your code. – Jeffrey Evans Oct 26 '17 at 20:49
  • 1
    You need to explain how you "merged" the data - did you use the "merge" function? With what parameters? What are the columns in your spatial and non-spatial data? How do they match up? – Spacedman Oct 26 '17 at 21:58
  • @JeffreyEvans, Sorry, Jeffrey, I provided the code that I used. Looking forward to your reply. – Emily Oct 27 '17 at 13:10
  • @Spacedman, I used 'merge' function. Merging itself was not a problem. I have matching ID for both data files and could merge them. I just want the result to be SpatialPolygonsDataFrame, not normal data frame. – Emily Oct 27 '17 at 13:12
  • Think I understand a bit better. But to be clear. You want to attach the spatial data (polygon) to each of the N records in a non-spatial data frame by matching on that code, so you end up with N records. Some of the N results could match the same spatial data frame row. Yes? – Spacedman Oct 27 '17 at 13:44
  • Because in that case the answer is probably to use match to work out which rows of the spatial data correspond to the non-spatial data, and then pull out those rows by subsetting. dsp = sp[match(d$Code, sp$Code),] where d and sp are the non-spatial and spatial data. Then cbind(dsp, d) to reattach the attributes of d. Look out for mismatches though. – Spacedman Oct 27 '17 at 13:49
  • @Spaceman, that's right. The result will have N records (of non-spatial data) because I used all.x=T. And I used zipcode as matching ID for both data (because both data have zipcode, so I can match). I will try the method that you just suggested and let you know. – Emily Oct 27 '17 at 14:05
  • It looks like your call to merge is incorrect. The x argument should be an sp class object and y a data.frame where as, you are passing the data.frame to x. Try using sp::merge(nation, us_urban_zipcode, by = "ZCTA5CE10") – Jeffrey Evans Oct 27 '17 at 14:07
  • @JeffreyEvans, I saw this post. The answer was created by you, and I think your approach is interesting. If I can build the SpatialPolygonsDataFrame (SPDF) with about 80,000 IDs (not 2 IDs in your example), then I think this can be another potential solution to my question above. Meaning first I construct the SPDF with 80,000 IDs and c-bind with other variables that I would like to add. Does it makes sense? Then do you know how to construct the SPDF with 80,000 obs?https://gis.stackexchange.com/questions/141469/how-to-convert-a-spatialpolygon-to-a-spatialpolygonsdataframe-and-add-a-column-t – Emily Oct 27 '17 at 14:07
  • @JeffreyEvans, I tried the code that you suggested (us_urban_index_nation2 <- sp::merge(nation, us_urban_zipcode, by = "ZCTA5CE10", all.y=T), and it gave following error --> Error in .local(x, y, ...) : non-unique matches detected. I don't understand because when I used the merge code in my post, it worked fine, not giving any error. Problem of the code is it returned DF not SPDF. Do you nave any idea why the error occurred? – Emily Oct 27 '17 at 14:24
  • It means either that your ID field "ZCTA5CE10" is not unique or that the dimensions of the join do not match the dimensions of the spatial object. Why did you use all.y=TRUE? You are telling the functions that it must retain lines in y that do not match lines in x. If your data.frame is larger than the sp object, this will throw an error. You really need to read the help for the function ?sp::merge – Jeffrey Evans Oct 27 '17 at 14:28
  • @JeffreyEvans, when I used duplicateGeoms = T, it didn't give any error. That merged fine. But it didn't right-join (as I used all.y=T) correctly. I want to join based on non-spatial data, which is in this case right-join, I guess. Do you have any though how to fix this? – Emily Oct 27 '17 at 15:00

1 Answers1

4

I would recommend reading your shapefile in with rgdal::readOGR. If you run into performance issues you should look up how to read in spatial data and merge data using the sf library and the simple features workflow.

For this to work I like to have column names that are to be merged to be identical before performing my merge. You can also specify column names using the by.x and by.y arguments in the merge function.

library(rgdal)
mydf   <- read.csv("myCsv.csv")
myspdf <- readOGR("myShapefile.shp")

## then merge using sp's merge function
mynewspdf <- merge(myspdf, mydf)

You may get a "non-unique matches detected" error, in which case you can try..

mynewspdf <- merge(myspdf, mydf, duplicateGeoms = T)

See for more info -> https://www.rdocumentation.org/packages/sp/versions/1.2-5/topics/merge

JMT2080AD
  • 655
  • 5
  • 11
  • thanks for the reply. As you said, I got the "non-unique matches detected" error and used duplicatedGeons = T. I didn't get any error but the data is bigger than I expected. The code that I used is us_urban_index_nation2 <- sp::merge(nation, us_urban_zipcode, by = "ZCTA5CE10", all.y=T, duplicateGeoms = TRUE) because I need to right join of the data. (The no of obs of non-spatial data is 80,000, but the code resulted in 110,000 obs approximately. It seems that the right-join command doesn't work well.) – Emily Oct 27 '17 at 14:51
  • I think the tool works as expected. I bet you have something going on with your data that causes more duplication than expected. I have a lot of experience using that tool and outer joins in general and I can tell you that it is probably your data. Also, I think you have a logical error in your approach. You want a to merge a sp object with a data frame but keep all the records in your data frame even if they don't match your sp object, then return an sp object. This is not how an sp object works. An sp object needs to have a geometry for every record. Your query logic does not allow for that. – JMT2080AD Oct 27 '17 at 21:02
  • How many records do you get if you remove all.y = T from your query? – JMT2080AD Oct 27 '17 at 21:31