1

In order to create my tmap, I merged my spatial data (which includes State ZIP Codes, Cities and ZIP Codes Polygon Geometry) with my non-spatial data (Individual Personal Info, Individual's ZIP Code, and other variables). In total I have 1500 individuals so as you can imagine, I have more than one individual per ZIP Code.

Therefore, by the time I plot the tmap for a specific variable (e.g weight) I am not sure whether it plots only one individual per polygon or plots a consensus (e.g if most individuals with the same polygon geometry weight 132 lb, the polygon color is one that reflects 132 lb).

How can I check that and what can I do in order to represent all the individuals per polygon?

# Open libraries
knitr::opts_chunk$set(echo = TRUE, warning=FALSE)
suppressMessages(library(tidyverse))  # Modern data science workflow
suppressMessages(library(sf)) # Simple features for R
suppressMessages(library(dplyr))
suppressMessages(library(tmap))       # Thematic Maps
suppressMessages(library(tmaptools))  # Thematic Maps Tools
suppressMessages(library(RColorBrewer)) # ColorBrewer Palettes
suppressMessages(library(leaflet))    # Interactive web maps
suppressMessages(library(rgdal))      # Bindings for the Geospatial Data Abstraction Library
suppressMessages(library(rgeos))      # Interface to Geometry Engine - Open Source 
suppressMessages(library(rio)) # for reading in data
suppressMessages(library(htmlwidgets))
suppressMessages(library(xlsx))  # trabajar con excel
suppressMessages(library(gifski))

Read shapefile (includes ZIP Code and Geometry = Polygons)

cp_cdmx <- read_sf("~/CP_CdMx")

Read file with districts and ZIP Code

deleg_cdmx <- read.csv("~/cp_deleg_cdmx_uniq.csv", header = FALSE)

Exchange names from district file

colnames(deleg_cdmx) <- c("d_cp", "delegacion") deleg_cdmx$d_cp <- as.character(deleg_cdmx$d_cp)

Join geometry, ZIP code and district. This join is by ZIP code

cp_deleg_cdmx <- cp_cdmx %>% left_join(deleg_cdmx, by = 'd_cp')

Import non-spatial data

basal_db <- rio::import("~/subset_basal.xlsx")

Rename ZIP code column for common name with the other file

names(basal_db)[names(basal_db) == "cp_actual"] <- "d_cp"

Process data

cp_deleg_cdmx$d_cp <- as.numeric(cp_deleg_cdmx$d_cp) basal_db$d_cp <- as.numeric(basal_db$d_cp) # NAs are inserted where no ZIP code exist

Merge spatial and non-spatial data

The merge is by ZIP code

basal_map_cdmx <- merge(cp_deleg_cdmx, basal_db, by = "d_cp")

Keep the data as sf class, so we will not lose the coordinate system

basal_cdmx <- st_as_sf(basal_map_cdmx) st_crs(cp_deleg_cdmx)

Plot Map

tmap_mode("plot")

First layer = all CDMX ZIP codes

map <- tm_shape(cp_cdmx) + tm_borders(col = "#737373", alpha = 0.3) +

Second layer

tm_shape(basal_cdmx) + tm_polygons(col = "actfisicatrabajo", title = "Post-Job Workout", palette = 'Blues', legend.hist = TRUE, legend.z = 0, legend.hist.z = 0.5, legend.shape.z = 0, border.alpha = 0.1) + tm_facets(by = "sexo", as.layers = FALSE, free.scales = TRUE, showNA = FALSE) + tm_legend(text.size=1, title.size=1.2, position = c("left","top"), bg.color = "white", bg.alpha= 0.1, frame= FALSE, height= 0.3, hist.width= 0.2, hist.height= 0.3, hist.bg.alpha= 0.5) + tm_scale_bar(color.dark = "gray60", position = c("right", "bottom")) + tm_view(colorNA = NULL, view.legend.position = c("right","top"), set.zoom.limits = c(5, 15)) + tm_layout(frame = "gray60", frame.lwd = 0, inner.margins = 0, outer.margins = 0, asp = NA, legend.hist.width = 0.15, legend.hist.height = 0.15, legend.height = 0.4, legend.width = 0.4, legend.outside.size = 0.2, panel.labels = c('Hombres','Mujeres'), panel.label.size = 1, panel.label.bg.color = "azure3")

All the files I use are in this Drive link

Yogesh Chavan
  • 2,705
  • 2
  • 9
  • 24

1 Answers1

1

Your (nonspatial) dataset has 1542 observations, but you only have 1050 POLYGONS. You cannot show all individuals using these polygons. There are several ways to visualize your data:

  1. Calculate and show the ratio of Yes/No instead of the actual number.
  2. Add figures for each polygon - see examples in ?tm_markers.
  3. Plot your data as points - jittered centroids of your polygons.
basal_map_cdmx <- full_join(cp_deleg_cdmx, basal_db, by = "d_cp")

basal_map_cdmx_points <- st_centroid(basal_map_cdmx)

tm_shape(cp_cdmx) + tm_borders(col = "#737373", alpha = 0.3) + tm_shape(basal_map_cdmx_points) + tm_dots(jitter = 0.03, col = "actfisicatrabajo", palette = 'Set2') + tm_facets(by = "sexo", as.layers = FALSE, free.scales = TRUE, showNA = FALSE)

enter image description here

Jot eN
  • 972
  • 8
  • 21