I have downloaded shapefiles of county/state boundaries and census data keyed to those shapefiles from the NHGIS. Both the shapefiles and the census data have a GISJOIN column to be used for merging the data. I'm able to do this just fine in QGIS, but I can't for the life of me get it to work in R/ggplot2.
This is what the data looks like:
str(churches_1850)
'data.frame': 36 obs. of 30 variables:
$ GISJOIN : Factor w/ 36 levels "G010","G050",..: 1 2 3 4 5 6 7 8 9 10 ...
$ YEAR : int 1850 1850 1850 1850 1850 1850 1850 1850 1850 1850 ...
$ STATE : Factor w/ 36 levels "Alabama","Arkansas",..: 1 2 3 4 5 6 7 8 9 10 ...
$ STATEA : int 10 50 60 90 100 110 120 130 170 180 ...
$ COUNTY : logi NA NA NA NA NA NA ...
$ COUNTYA : logi NA NA NA NA NA NA ...
$ AREANAME: Factor w/ 36 levels "Alabama","Arkansas",..: 1 2 3 4 5 6 7 8 9 10 ...
$ AET001 : int 579 114 1 114 12 6 56 885 282 430 ...
$ AET002 : int 18 NA NA 4 NA NA NA 5 69 187 ...
$ AET003 : int NA NA NA 252 NA NA NA 1 46 2 ...
$ AET004 : int NA NA NA NA NA NA NA NA NA NA ...
$ AET005 : int NA NA NA NA NA NA NA NA 2 5 ...
$ AET006 : int 17 2 1 101 21 8 10 20 27 24 ...
$ AET007 : int 5 1 NA 1 NA NA 1 6 2 10 ...
$ AET008 : int NA NA NA 5 9 1 NA 2 6 89 ...
$ AET009 : int NA NA NA NA NA NA NA NA 3 5 ...
$ AET010 : int NA NA NA 2 NA NA NA NA NA NA ...
$ AET011 : int 1 NA NA NA NA 2 NA 8 42 63 ...
$ AET012 : int NA NA NA NA NA NA NA NA NA NA ...
$ AET013 : int 577 168 5 185 106 16 87 809 405 779 ...
$ AET014 : int NA NA NA NA NA NA NA 1 2 57 ...
$ AET015 : int 162 52 3 17 26 6 16 97 206 282 ...
$ AET016 : int NA NA NA NA NA NA 2 NA NA NA ...
$ AET017 : int 5 7 18 12 3 6 5 8 59 63 ...
$ AET018 : int NA NA NA NA NA NA NA NA 2 NA ...
$ AET019 : int NA NA NA NA NA NA NA NA 4 5 ...
$ AET020 : int 4 5 NA 4 1 NA NA 16 30 5 ...
$ AET021 : int 1 NA NA 5 NA 1 NA NA 4 1 ...
$ AET022 : int 3 NA NA 22 NA NA NA 3 7 15 ...
$ AET023 : int 3 13 NA 10 2 NA NA 7 25 13 ...
This is what I'm trying so far:
library(rgdal)
library(ggplot2)
states_1850 <-readOGR(dsn = "shp/nhgis0007_shapefile_tl2000_us_state_1850/",
layer = "US_state_1850")
churches_1850 <- read.csv("csv/nhgis0007_ds10_1850_state.csv")
states_1850_df <- fortify(states_1850, region = "GISJOIN")
states_with_data <- merge(states_1850_df, churches_1850, by.x="id", by.y="GISJOIN")
ggplot() +
geom_polygon(data=states_with_data,
aes(x=long, y=lat, group=group, fill=AET001)) +
coord_map()
I've tried a number of other ways of merging the data, but they all either end up not merging correctly, or tearing the shapes of the states/counties, or being so large that plotting them crashes my machine (8GB ram, shouldn't be a problem).
This seems like it should be easier. What's the proper way to do this?
UPDATE: This method of merging data frames seems to be the older way of doing things. Using geom_map is much more powerful, since it merges the data on the fly rather than at the beginning. It also allows for easier faceting. See the ggplot2 docs for geom_map.
My revised plotting code looks like this:
p <- ggplot() +
geom_map(data = churches_1850,
aes(map_id = GISJOIN, fill = AET015),
map = shp_1860, size=0.01, alpha=.9) +
expand_limits(x = shp_1860$long, y=shp_1860$lat) +
scale_fill_gradient2(low="white", high="red")
print(p)
states_1850_dfandchurches_1850and their respective columns in themergecommand, does it make a difference? – Get Spatial Oct 01 '13 at 23:02GISJOINis a factor. Does it have the same levels and use identical codes in both files? If not, see theRhelp forfactorandas.factorfor how to make them match. Alternatively, study the help forread.csvfor how to keep this field as a string type. – whuber Oct 02 '13 at 14:39sort = FALSEin themergecall, as per this question? – SlowLearner Oct 03 '13 at 19:01stringsAsFactors=Ffor both the fortified shapefile data frame and the data frame with the actual data. That seemed to have helped. – Lincoln Mullen Oct 04 '13 at 15:48sort= FALSEbefore and it didn't work now. But ordering the fortified data frame this way, did work:d_1860 <-d_1860[order(d_1860$order),]. If you leave an answer I'll mark it as correct. – Lincoln Mullen Oct 04 '13 at 15:50geom_mapdetailed in an update above. – Lincoln Mullen Oct 04 '13 at 15:56