4

I have downloaded shapefiles of county/state boundaries and census data keyed to those shapefiles from the NHGIS. Both the shapefiles and the census data have a GISJOIN column to be used for merging the data. I'm able to do this just fine in QGIS, but I can't for the life of me get it to work in R/ggplot2.

This is what the data looks like:

str(churches_1850)
'data.frame':   36 obs. of  30 variables:
 $ GISJOIN : Factor w/ 36 levels "G010","G050",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ YEAR    : int  1850 1850 1850 1850 1850 1850 1850 1850 1850 1850 ...
 $ STATE   : Factor w/ 36 levels "Alabama","Arkansas",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ STATEA  : int  10 50 60 90 100 110 120 130 170 180 ...
 $ COUNTY  : logi  NA NA NA NA NA NA ...
 $ COUNTYA : logi  NA NA NA NA NA NA ...
 $ AREANAME: Factor w/ 36 levels "Alabama","Arkansas",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ AET001  : int  579 114 1 114 12 6 56 885 282 430 ...
 $ AET002  : int  18 NA NA 4 NA NA NA 5 69 187 ...
 $ AET003  : int  NA NA NA 252 NA NA NA 1 46 2 ...
 $ AET004  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ AET005  : int  NA NA NA NA NA NA NA NA 2 5 ...
 $ AET006  : int  17 2 1 101 21 8 10 20 27 24 ...
 $ AET007  : int  5 1 NA 1 NA NA 1 6 2 10 ...
 $ AET008  : int  NA NA NA 5 9 1 NA 2 6 89 ...
 $ AET009  : int  NA NA NA NA NA NA NA NA 3 5 ...
 $ AET010  : int  NA NA NA 2 NA NA NA NA NA NA ...
 $ AET011  : int  1 NA NA NA NA 2 NA 8 42 63 ...
 $ AET012  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ AET013  : int  577 168 5 185 106 16 87 809 405 779 ...
 $ AET014  : int  NA NA NA NA NA NA NA 1 2 57 ...
 $ AET015  : int  162 52 3 17 26 6 16 97 206 282 ...
 $ AET016  : int  NA NA NA NA NA NA 2 NA NA NA ...
 $ AET017  : int  5 7 18 12 3 6 5 8 59 63 ...
 $ AET018  : int  NA NA NA NA NA NA NA NA 2 NA ...
 $ AET019  : int  NA NA NA NA NA NA NA NA 4 5 ...
 $ AET020  : int  4 5 NA 4 1 NA NA 16 30 5 ...
 $ AET021  : int  1 NA NA 5 NA 1 NA NA 4 1 ...
 $ AET022  : int  3 NA NA 22 NA NA NA 3 7 15 ...
 $ AET023  : int  3 13 NA 10 2 NA NA 7 25 13 ...

This is what I'm trying so far:

library(rgdal)
library(ggplot2)
states_1850 <-readOGR(dsn = "shp/nhgis0007_shapefile_tl2000_us_state_1850/",
                        layer = "US_state_1850")
churches_1850 <- read.csv("csv/nhgis0007_ds10_1850_state.csv")
states_1850_df <- fortify(states_1850, region = "GISJOIN")
states_with_data <- merge(states_1850_df, churches_1850, by.x="id", by.y="GISJOIN")

ggplot() +
  geom_polygon(data=states_with_data,
               aes(x=long, y=lat, group=group, fill=AET001)) +
  coord_map()

I've tried a number of other ways of merging the data, but they all either end up not merging correctly, or tearing the shapes of the states/counties, or being so large that plotting them crashes my machine (8GB ram, shouldn't be a problem).

This seems like it should be easier. What's the proper way to do this?

UPDATE: This method of merging data frames seems to be the older way of doing things. Using geom_map is much more powerful, since it merges the data on the fly rather than at the beginning. It also allows for easier faceting. See the ggplot2 docs for geom_map.

My revised plotting code looks like this:

p <- ggplot() + 
  geom_map(data = churches_1850, 
           aes(map_id = GISJOIN, fill = AET015),
           map = shp_1860, size=0.01, alpha=.9) +
  expand_limits(x = shp_1860$long, y=shp_1860$lat) +
  scale_fill_gradient2(low="white", high="red")
print(p)
Lincoln Mullen
  • 687
  • 7
  • 14
  • 1
    It might help to include a description of the results you are getting, and how they are in error. Could it have to do with the order that you are merging the layers together? If you switch states_1850_df and churches_1850 and their respective columns in the merge command, does it make a difference? – Get Spatial Oct 01 '13 at 23:02
  • I notice that GISJOIN is a factor. Does it have the same levels and use identical codes in both files? If not, see the R help for factor and as.factor for how to make them match. Alternatively, study the help for read.csv for how to keep this field as a string type. – whuber Oct 02 '13 at 14:39
  • Could it be that you need to set sort = FALSE in the merge call, as per this question? – SlowLearner Oct 03 '13 at 19:01
  • 1
    @whuber: I tried setting the option stringsAsFactors=F for both the fortified shapefile data frame and the data frame with the actual data. That seemed to have helped. – Lincoln Mullen Oct 04 '13 at 15:48
  • 1
    @SlowLearner: I had tried sort= FALSE before and it didn't work now. But ordering the fortified data frame this way, did work: d_1860 <-d_1860[order(d_1860$order),]. If you leave an answer I'll mark it as correct. – Lincoln Mullen Oct 04 '13 at 15:50
  • I found a better way to do this using geom_map detailed in an update above. – Lincoln Mullen Oct 04 '13 at 15:56
  • If your solution works it may be worth using it as the answer to your own question. – SlowLearner Oct 04 '13 at 17:24

1 Answers1

1

This method of merging data frames seems to be the older way of doing things. Using geom_map is much more powerful, since it merges the data on the fly rather than at the beginning. It also allows for easier faceting. See the ggplot2 docs for geom_map.

My revised plotting code looks like this:

shp_1860 <- readOGR("/home/lmullen/research-data/nhgis-shapefiles/county_1860_simple/",
                    "US_county_1860",
                    stringsAsFactors = F)

shp_1860 <- fortify(shp_1860, region="GISJOIN")

c_1860 <- read.csv("~/research-data/nhgis-religion/ds14_1860_county.csv",
                   stringsAsFactors=F)

p <- ggplot() + 
  geom_map(data = c_1860, 
           aes(map_id = GISJOIN, fill = AET015),
           map = shp_1860) +
  expand_limits(x = shp_1860$long, y=shp_1860$lat) +
  scale_fill_gradient2(low="white", high="red")
print(p)
Lincoln Mullen
  • 687
  • 7
  • 14