1

I'm looking for a way to take zip code shapefiles and merge them into larger areas. I've looked up several links but I just can't seem to figure it out, and I keep getting stuck on the cut function in the examples that I have reviewed. All of the examples that I have been able to find use the cut function in R but they all split arbitrarily using a percentile function. I want to aggregate based on a grouping that isn't guaranteed to have a simple numerical range that defines the group. I am also curious if anyone knows why the Census data is missing some zip codes? I have pasted my code below.

Here are the links that I have already checked:


# Objective:  Read Zipcode polygons (currently using census source, but it appears to missing some zipcodes:  Anybody know why?),
# Merge zipcodes into 'sub counties' (an internal concept, each zipcode belongs to a single sub county) and aggregate data by that 'sub county'
# Ideally this would output a TopoJSON map

# Required packages
pacman::p_load("rgdal","rgeos","maptools","gridExtra","geojsonio")

# Import US zipcode data (original source is here: http://www2.census.gov/geo/tiger/GENZ2015/shp/cb_2015_us_zcta510_500k.zip)
us <- readOGR(dsn = "C:/Users/MyUsername/Documents/R/Data/ZipShapes", layer = "cb_2015_us_zcta510_500k")

# Create a subsetted US polygon for only AZ, simplifying for this single example
AZZips <- as.factor(85001:87000)
AZOnly <- subset(us,us@data$ZCTA5CE10 %in% AZzips)
AZOnly@data <- droplevels(AZOnly@data)
AZOnlyDF <- AZOnly@data
AZOnlyDF$ID <- row.names(AZOnly@data)

# Data frame with zip code and 'sub county' relationship
Col1 <- as.factor(
      c("85938","85932","85924","85920","85930","85940","85927","85925","86502","85936","86505","86511","86545","86028","86556","86507"
        ,"86547","86535","86508","86512","86544","86514","86504","86549","86506","86538","86515","86540","86503","86023","86020","86036"
        ,"86015","86022","86044","86017","86053","86038","86024","86035","86018","85931","86045","86435","86339","86016","86052","86040"
        ,"86001","86005","86003","86004","86002","86046","85603","85607","85636","85643","85627","85616","85638","85670","85608","85630"
        ,"85625","85671","85602","85650","85632","85620","85615","85626","85635","85606","85655","85613","85617","85610","85644","85605"
        ,"85609","85552","85543","85548","85546","85536","85541","85553","85235","85544","85532","85542","85539","85554","85501","85502"
        ,"85547","85550","85545","85292","85533","85534","85357","85371","85328","85344","85325","85359","85346","85334","85348","85379"
        ,"85387","85311","85376","85358","85372","85382","85312","85374","85308","85373","85375","85310","85378","85342","85388","85390"
        ,"85383","85361","85318","85224","85246","85286","85225","85280","85249","85284","85233","85287","85226","85283","85244","85227"
        ,"85285","85248","85236","85295","85298","85299","85296","85234","85289","85212","85297","85082","85024","85074","85029","85022"
        ,"85027","85079","85050","85053","85068","85064","85061","85020","85025","85076","85072","85066","85083","85062","85054","85075"
        ,"85071","85032","85069","85023","85073","85063","85028","85021","85051","85067","85060","85080","85078","85290","85320","85337"
        ,"85306","85037","85304","85301","85035","85039","85380","85030","85363","85031","85303","85381","85307","85305","85033","85038"
        ,"85026","85351","85043","85335","85345","85339","85353","85036","85385","85302","85205","85216","85282","85203","85208","85215"
        ,"85201","85214","85206","85281","85274","85204","85211","85209","85275","85210","85202","85277","85207","85213","85327","85331"
        ,"85086","85087","85085","85377","85396","85340","85329","85343","85338","85323","85354","85355","85395","85309","85322","85326"
        ,"85392","85268","85250","85271","85261","85259","85255","85256","85267","85254","85070","85018","85269","85251","85258","85260"
        ,"85257","85253","85008","85266","85262","85264","85252","85263","85045","85001","85016","85007","85017","85040","85014","85042"
        ,"85006","85003","85019","85015","85010","85002","85046","85044","85041","85009","85012","85034","85048","85005","85011","85004"
        ,"85013","86443","86021","86404","86438","86433","86403","86437","86445","86405","86411","86406","86436","86409","86434","86413"
        ,"86402","86431","86444","86441","86412","86401","85360","86446","86426","86430","86439","86427","86429","86440","86442","86432"
        ,"85901","85933","86031","85929","85939","85935","86520","86047","85923","85934","86032","85902","85926","85912","85941","85942"
        ,"86029","85911","85937","85928","86025","86033","86034","86510","86043","86042","86054","86030","86039","85752","85706","85737"
        ,"85720","85719","85716","85745","85740","85714","85704","85754","85725","85731","85711","85732","85715","86704","85756","85702"
        ,"85728","85735","85722","85748","85746","85713","85705","85710","85741","85703","85750","85726","85730","85708","85701","85717"
        ,"85751","85743","85742","85734","85723","85733","85755","85749","85712","85747","85736","85718","85757","85629","85707","85622"
        ,"85614","85321","85652","85634","85653","85739","85658","85738","85641","85618","85219","85220","85623","85237","85218","85240"
        ,"85119","85243","85278","85118","85120","85143","85140","85217","85242","85142","85631","85130","85222","85294","85194","85230"
        ,"85293","85122","85238","85193","85223","85145","85132","85123","85231","85232","85139","85279","85128","85239","85138","85228"
        ,"85131","85648","85662","85637","85646","85645","85640","85624","85628","85621","85611","86334","85332","85324","86330","86332"
        ,"86343","86321","86357","86329","86303","86304","86323","86338","85362","86301","86315","86312","86305","86331","86302","86313"
        ,"86314","86327","86324","86333","86335","86342","86351","86340","86322","86336","86341","86325","86326","86320","85350","85364"
        ,"85366","85347","85367","85356","85333","85369","85365","85336","85349"
        ))
Col2 <- as.factor(
      c("APA-1","APA-1","APA-1","APA-1","APA-1","APA-1","APA-1","APA-1","APA-1","APA-1","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2"
        ,"APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","APA-2","CNC-1","CNC-1","CNC-1"
        ,"CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1","CNC-1"
        ,"CNC-2","CNC-2","CNC-2","CNC-2","CNC-2","CNC-3","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1"
        ,"COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1","COH-1"
        ,"COH-1","GHM-1","GHM-1","GHM-1","GHM-1","GHM-1","GLA-1","GLA-1","GLA-1","GLA-1","GLA-1","GLA-1","GLA-1","GLA-1","GLA-1","GLA-1"
        ,"GLA-1","GLA-1","GLA-1","GLA-1","GNL-1","GNL-1","LAP-1","LAP-1","LAP-1","LAP-1","LAP-1","LAP-1","LAP-1","LAP-1","LAP-2","MAR-1"
        ,"MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1","MAR-1"
        ,"MAR-1","MAR-1","MAR-1","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10","MAR-10"
        ,"MAR-10","MAR-10","MAR-10","MAR-11","MAR-11","MAR-11","MAR-11","MAR-11","MAR-11","MAR-11","MAR-11","MAR-11","MAR-2","MAR-2","MAR-2"
        ,"MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2"
        ,"MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-2","MAR-20","MAR-20","MAR-20"
        ,"MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3"
        ,"MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-3","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4"
        ,"MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-4","MAR-5","MAR-5","MAR-5","MAR-5","MAR-5"
        ,"MAR-5","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6","MAR-6"
        ,"MAR-6","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7","MAR-7"
        ,"MAR-7","MAR-7","MAR-7","MAR-8","MAR-8","MAR-8","MAR-8","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9"
        ,"MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MAR-9","MOH-1","MOH-1"
        ,"MOH-1","MOH-1","MOH-1","MOH-1","MOH-1","MOH-1","MOH-1","MOH-1","MOH-1","MOH-1","MOH-2","MOH-2","MOH-2","MOH-2","MOH-2","MOH-2","MOH-2"
        ,"MOH-2","MOH-2","MOH-2","MOH-3","MOH-3","MOH-3","MOH-3","MOH-3","MOH-3","MOH-3","MOH-3","MOH-4","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1"
        ,"NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-1","NVJ-2"
        ,"NVJ-2","NVJ-2","NVJ-2","NVJ-2","NVJ-2","NVJ-2","NVJ-2","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1"
        ,"PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1"
        ,"PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1","PIM-1"
        ,"PIM-1","PIM-1","PIM-1","PIM-1","PIM-2","PIM-2","PIM-2","PIM-2","PIM-20","PIM-20","PIM-20","PIM-3","PIM-3","PIM-3","PIM-3","PIM-3"
        ,"PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1","PIN-1"
        ,"PIN-1","PIN-2","PIN-2","PIN-2","PIN-2","PIN-2","PIN-2","PIN-2","PIN-2","PIN-2","PIN-3","PIN-3","PIN-3","PIN-3","PIN-3","PIN-3","PIN-3"
        ,"PIN-3","PIN-3","PIN-3","PIN-3","PIN-3","PIN-3","SC-1","SC-1","SC-1","SC-1","SC-1","SC-1","SC-1","SC-1","SC-1","SC-1","YAV-1","YAV-1"
        ,"YAV-1","YAV-1","YAV-1","YAV-1","YAV-1","YAV-1","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2","YAV-2"
        ,"YAV-2","YAV-2","YAV-2","YAV-2","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-3","YAV-4","YUM-1"
        ,"YUM-1","YUM-1","YUM-1","YUM-1","YUM-1","YUM-1","YUM-1","YUM-1","YUM-1","YUM-1"
        ))
Col3 <- c(
        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,0,0,116,0,0,0,0,17,0,0,0,0,0,0,0,243,195,0,418,0,53,0,1,0,1,0,4,0,0,0
        ,0,1,0,2,10,0,0,4,0,17,0,0,0,0,2,0,0,1,0,0,0,0,0,59,0,0,19,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,81,3,0,11,1,0,150,28,0,0,0,0,172,0,135,243,50,70
        ,94,16,2,102,5,182,13,0,192,0,178,291,0,231,83,192,0,189,150,0,0,0,231,0,177,110,0,213,267,0,104,149,0,112,0,101,201,129,0,179,78,0,0,0,137
        ,0,0,0,0,65,0,30,0,0,268,0,111,0,0,113,64,84,0,0,0,0,0,0,0,81,121,101,52,43,0,0,0,21,25,71,83,14,33,86,0,23,99,84,121,156,160,90,0,0,89,155
        ,0,177,94,153,77,107,0,112,123,0,130,0,160,0,92,128,0,169,99,0,242,322,44,90,42,40,107,0,0,176,93,9,27,93,0,0,155,188,186,158,0,0,229,395,4
        ,0,323,0,156,0,189,292,359,122,113,68,121,116,1,0,8,47,0,109,23,39,47,84,106,29,16,42,64,0,0,0,189,170,22,23,5,205,0,0,10,81,0,0,306,0,0,313
        ,0,0,0,0,513,12,90,0,55,0,0,7,1,0,123,0,0,211,0,1,0,96,83,345,2,53,17,0,15,0,25,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,31,66,0,21
        ,18,48,0,5,44,0,0,0,41,0,33,0,28,0,0,9,0,38,53,35,29,77,44,0,68,0,68,0,2,0,0,68,69,0,0,0,30,30,25,52,3,58,21,45,0,9,61,0,0,0,14,25,27,0,35
        ,0,3,3,1,0,1,4,52,6,0,74,61,198,149,0,12,229,0,0,16,2,11,0,0,141,5,5,4,3,50,19,0,1,70,0,40,2,191,0,6,12,0,1,5,2,0,0,0,1,0,38,0,6,0,0,0,0,0
        ,2,133,0,99,3,2,189,82,0,125,2,0,0,228,90,45,14,45,5,211,0,96,273,0,92,284,0,82,765,0,0,303,13,0,0,617,1,37
      )
AZSubCounties<-data.frame(ZipCode = Col1, SubCounty = Col2, NumberToAggregate = Col3)

# Merge dataframes
MergedDF <- merge(AZSubCounties,AZOnlyDF,by.x="ZipCode",by.y="ZCTA5CE10",all.y=TRUE)
AZOnly@data <- MergedDF

#Plot original census zip codes
spplot(AZOnly,"NumberToAggregate",main = "Original census zip codes")

#Aggregate NumberToAggregate at the 'sub county' level
AZCoords <- coordinates(AZOnly)
AZIDs <- cut(AZCoords[,1], ????, include.lowest=TRUE)
AZOnlyDFAgg <- aggregate(MergedDF[,3], list(AZIDs), sum)

# This where I get stuck, I know that I'm supposed to union the polygons together somehow.  
# I've done research but I don't understand the 'cut' examples provided and how that works together with the IDs and polygons
AZUnion <- unionSpatialPolygons(AZOnly, AZIDs)

# Reconvert data frame to SpatialPolygons
NewAZ <- SpatialPolygonsDataFrame(AZUnion, AZOnlyDFAgg)

# Output geojson file (I'd like to output TopoJSON, but geojsonio doesn't have that functionality yet)
geojson_write(NewAZ,file="C:/Users/MyUsername/Documents/R/Data/NewAZ.geojson")
New Guy
  • 53
  • 1
  • 6
  • This might answer your question about missing zip codes: http://www.georeference.org/doc/zip_codes_are_not_areas.htm. Every physical location within the territorial US does not necessarily have one, which is why there are "gaps" in the shapefile. – iskandarblue Jan 25 '17 at 00:09
  • Does it matter that you have NA values in MergedDF? sum(is.na(MergedDF)) [1] 46 – iskandarblue Jan 25 '17 at 04:20

1 Answers1

1

Here is a simple example aggregating zipcodes that fall within the state of Minnesota. The source for the states geojson file is here.

First, import states and zips:

require(sp)

zips <- readOGR("cb_2015_us_zcta510_500k.shp")
states <- readOGR("states.geojson", "OGRGeoJSON")

then with @proj4string you will notice that the dataframes have differing CRS

Change the CRS of states to be identical to that of zips with:

states <- spTransform(states, CRS(proj4string(zips)))

Now we check and they are equal

> proj4string(states)
[1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
> proj4string(zips)
[1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"

Okay, now let's subset states by a random state - Minnesota:

MN <- states[states@data$name =='Minnesota', ]
plot(MN)

enter image description here

One great thing about R is that we can perform an intersect query between two spatial dataframes just as easily as we can subset a matrix. For example, if we want to see which zipcodes intersect with MN, we subset zips:

zips_subset <- zips[MN,]

plot(zips_subset)
plot(MN, border="green", lwd = 3, add = TRUE)

enter image description here

Now, if we want to dissolve all of the zips_subset polygons, it is simple:

> dissolved <-  gUnaryUnion(zips_subset)
> plot(dissolved)

enter image description here

Solution to subcounties:

1. Get rid of NA values in AZOnly because they do not contain a $SubCounty attribute:

> sum(is.na(AZOnly@data))
[1] 46

define a function to omit NAs in sp:

sp.na.omit <- function(x, margin=1) {
  if (!inherits(x, "SpatialPointsDataFrame") & !inherits(x, "SpatialPolygonsDataFrame")) 
    stop("MUST BE sp SpatialPointsDataFrame OR SpatialPolygonsDataFrame CLASS OBJECT") 
  na.index <- unique(as.data.frame(which(is.na(x@data),arr.ind=TRUE))[,margin])
  if(margin == 1) {  
    cat("DELETING ROWS: ", na.index, "\n") 
    return( x[-na.index,]  ) 
  }
  if(margin == 2) {  
    cat("DELETING COLUMNS: ", na.index, "\n") 
    return( x[,-na.index]  ) 
  }
}

> AZOnly <- sp.na.omit(AZOnly)

DELETING ROWS:  383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 

> sum(is.na(AZOnly@data))
[1] 0

2. Create a separate data.frame in which to aggregate NumberToAggregate by SubCounty and rename column to sum:

> index <- data.frame()
> index <- rbind(index, AZOnly@data)
> index <- aggregate(NumberToAggregate~SubCounty, sum, data=index)
colnames(index)[2] <- "sum"

3. Dissolve AZOnly by subcounties:

ScIDS <- AZOnly@data[["SubCounty"]]
AZ <- gUnaryUnion(AZOnly, id = ScIDS)

Now we are left with a Spatial Polygons object but we need a SpatialPolygonsDataFrame because we will need to merge with index

> class(AZ)
[1] "SpatialPolygons"
attr(,"package")
[1] "sp"

4. Convert AZ to SpatialPolygonsDataFrame and merge by SubCounty:

The row.names are by SubCountry in AZ so they must be the same in index

head(row.names(AZ))
[1] "APA-1" "APA-2" "CNC-1" "CNC-2" "CNC-3" "COH-1"

row.names(index) <- index$SubCounty

head(row.names(index))
[1] "APA-1" "APA-2" "CNC-1" "CNC-2" "CNC-3" "COH-1"

5. Once row.names are equal in both dataframes we can merge and then plot:

output <- SpatialPolygonsDataFrame(AZ, index)


spplot(output, "sum", main ="sums")

enter image description here

iskandarblue
  • 2,092
  • 2
  • 17
  • 34
  • I thank you for your help and I learned some useful new information with your example. My challenge with the solution that you presented is that I'm not trying to do a spatial join. I'm trying to create polygons that don't exist in any files by grouping already existing polygons together. It is quite possible that what you have proposed will solve my problem but I'm just not seeing it. – New Guy Jan 25 '17 at 04:21
  • I will try to go by your example - you have forgotten to define l – iskandarblue Jan 25 '17 at 04:32
  • Thanks. The code should work until you hit this line "AZIDs <- cut(AZCoords[,1], ????, include.lowest=TRUE)" because I'm not sure how to 'cut' coordinates or data frame the right way so that unionSpatialPolygons works correctly. – New Guy Jan 25 '17 at 04:36
  • you could cut using AZIDs <- cut(AZCoords[,1], unique(AZCoords[,1]), include.lowest=TRUE) if you want to have 405 breaks. But I'm guessing you want something like 42 breaks because that is the amount of subcounties. Essentially you want to dissolve by subcounty and then include the aggregate number of "zips" that went into that subcounty if I understand correctly? – iskandarblue Jan 25 '17 at 04:50
  • Yes, I want to dissolve by subcounty and then include the aggregated NumberToAggregate for each subcounty. – New Guy Jan 25 '17 at 05:00
  • BTW...I edited my earlier spplot command that had the errant 'l'. Not sure where that came from, but it has been fixed. – New Guy Jan 25 '17 at 06:01
  • I've edited my answer with a workaround - it's not the most efficient but works – iskandarblue Jan 25 '17 at 07:47
  • is it the correct output? – iskandarblue Jan 25 '17 at 08:52
  • Thank you. Based on what you presented, it looks like that SHOULD work, however when I run this line: 'AZ <- gUnaryUnion(AZOnly, id = ScIDS)' I receive the following error: 'Error in slot(p, "Polygons") : cannot get a slot ("Polygons") from an object of type "NULL"'. I can get everything up to that point to work. – New Guy Jan 25 '17 at 18:43
  • I stared a new instance of R and I was able to get your solution to work! Thanks! Do you happen to have any good links that explain WHY this works? I'm still not sure how R knows which lines to dissolve based only on the ID values of the polygons. – New Guy Jan 25 '17 at 22:52