Challenging visualization task: simultaneously visualizing 15 regions by 3 descriptive values

Question

I have a rather challenging visualization task.

15 geographical regions (data points for plot), each described by 3 continuous values and all these values have confidence intervals. The aim is to visualise this data in a way, which allows to compare/rank these regions (e.g. from poor to good or from low disparity to high disparity).

For example:

region A: all descriptive values are very low
region C: two descriptive values very high, one very low
region F: all descriptive values are very high

Generated data

library(tidyverse)
set.seed(1)
#make new data
df = tibble(
  region = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
  prop = rnorm(15, 50, 20),
  income = rnorm(15, 10, 4),
  sd = rnorm(15, 2, 1))
#add confidence intervals
df = df %>% mutate(
  prop_lower = prop - runif(1, 5, 15),
  prop_upper = prop + runif(1, 5, 15),
  income_lower = income - runif(1, 0.5, 2.2),
  income_upper = income +  runif(1, 0.5, 2.2),
  sd_lower = sd - runif(1, 0.1, 0.6),
  sd_upper = sd + runif(1, 0.1, 0.6))
#reorder columns
df = df[c("region", "prop", "prop_lower", "prop_upper", "income", "income_lower", "income_upper", "sd", "sd_lower", "sd_upper")]

<-- -->

 region      prop prop_lower prop_upper    income income_lower income_upper        sd  sd_lower sd_upper
      A 37.470924  30.074630   43.06027  9.820266    8.2283755    11.809923 3.3586796 2.8692222 3.857334
      B 53.672866  46.276572   59.26221  9.935239    8.3433489    11.924897 1.8972123 1.4077549 2.395867
      C 33.287428  25.891134   38.87677 13.775345   12.1834548    15.765003 2.3876716 1.8982143 2.886326
      D 81.905616  74.509322   87.49496 13.284885   11.6929947    15.274542 1.9461950 1.4567376 2.444849
      E 56.590155  49.193861   62.17950 12.375605   10.7837152    14.365263 0.6229404 0.1334831 1.121595
      F 33.590632  26.194338   39.17998 13.675909   12.0840194    15.665567 1.5850054 1.0955481 2.083660
      G 59.748581  52.352287   65.33792 13.128545   11.5366552    15.118203 1.6057100 1.1162527 2.104364
      H 64.766494  57.370200   70.35584 10.298260    8.7063699    12.287918 1.9406866 1.4512293 2.439341
      I 61.515627  54.119333   67.10497  2.042593    0.4507032     4.032251 3.1000254 2.6105680 3.598680
      J 43.892232  36.495938   49.48158 12.479303   10.8874130    14.468961 2.7631757 2.2737184 3.261830
      K 80.235623  72.839329   85.82497  9.775485    8.1835950    11.765143 1.8354764 1.3460191 2.334131
      L 57.796865  50.400571   63.38621  9.376818    7.7849279    11.366476 1.7466383 1.2571810 2.245293
      M 37.575188  30.178894   43.16453  4.116990    2.5251004     6.106648 2.6969634 2.2075060 3.195618
      N  5.706002  -1.690292   11.29535  8.087400    6.4955097    10.077057 2.5566632 2.0672059 3.055318
      O 72.498618  65.102324   78.08796 11.671766   10.0798762    13.661424 1.3112443 0.8217870 1.809899

Best option I have found for plotting so far

library(patchwork)
#panel 1
A = df %>%
ggplot(aes(x = prop, y = income, color = region, label = region))+
  geom_errorbarh(aes(xmin = prop_lower, xmax = prop_upper, height = 0), alpha = 0.8)+
  geom_linerange(aes(ymin = income_lower, ymax = income_upper), size = 0.5, alpha = 0.8) +
  geom_text(size = 3.5, alpha = 0.8, segment.alpha = 0, 
  nudge_x = 2,
  nudge_y = 0.5)+
  geom_point(aes(), size = 1.5)+
  theme_classic()+
  theme(legend.position = "none")
#panel 2
B = df %>%
ggplot(aes(x = region, y = sd))+
  geom_linerange(aes(ymin = sd_lower, ymax = sd_upper), size = 0.5, alpha = 0.8) +
  geom_point(aes(), size = 1.5)+
  theme_classic()+
  theme(legend.position = "none")
#panel 1 and 2

A/B + plot_layout(heights = c(6, 2))

The problem is that you have to check multiple plot panels before making any conclusions about a region.

Or should I think out of the box and do something completely different?

E.g. trying to visualise these regions on a map? Any strategies for this (how to choose the colours etc)?

A random example from google photos

How you guys visualise such data? Any promising three-axis plots?

Provided solutions so far

cdalitz suggested multiple coordinate plot. should be doable with CIs while doing 3 separate ggplots, later connection lines and labels can be added in a graphical program. I made a scheme.

Can you provide a small example dataset for people to work with? It can be fake data, if yours are confidential. Do you want the CIs to show up in the plot? What do you mean by the CIs should be "comparable"? — gung - Reinstate Monica, Nov 13 '20 at 20:37
Good point, Monica! I edited the original post by providing my best solutions so far. — st4co4, Nov 14 '20 at 07:57
I forgot one answer: Showing CIs allows easy interpretation - is there a significance between the regions in a certain measure or not. But maybe this is a mission impossible. — st4co4, Nov 14 '20 at 08:02
Have you tried a parallel coordinates plot? In R, it is available in the function parcoord of the builtin library MASS. — cdalitz, Nov 14 '20 at 08:14
You state that your aim is to find a plot that will allow you to rank the regions. No plot will do that for you. You need to decide how to rank regions based on a substantive understanding of the variables (see: Creating an index of quality from multiple variables to enable rank ordering). — gung - Reinstate Monica, Nov 14 '20 at 13:49
Thank you for the link! Seems that just finding a visualization method would be a good start. "Ranking" can later be done in discussion. — st4co4, Nov 14 '20 at 14:00

score 2 · Answer 1 · answered Nov 14 '20 at 12:53

Consider a multivariate analysis as a descriptive statistic. Use canonical variates to simultaneously predict the 14 indicator variables for 15 regions from the other variables. The first canonical variate is the linear correlation of the other variables having the maximum $R^2$ with the best linear combination of the indicators. Then interpret the region loadings to see to what extent regions differ from one another, and interpret the coefficients of the three variables to see how regions vary with respect to those variables.

A small worked example would probably help a lot here. – gung - Reinstate Monica Nov 14 '20 at 13:15 — gung - Reinstate Monica, Nov 14 '20 at 13:15

Challenging visualization task: simultaneously visualizing 15 regions by 3 descriptive values

1 Answers1