1

I have a data set of 30 individuals, with each individual categorised against 3 parameters. For two of the parameters A and B, an individual can either be in the parameter or out of it.

A = 0 or A = 1
B = 0 or B = 1

For the Third parameter (C), and individual can have a value of 0 (not in the parameter), or 1, 2 and 3. The numbers are relevant to amount that an individual exhibits that parameter. 0 - not at all, 3 - the most.

C = 0, 1, 2, 3

An example of the set would be:

Individual # A B C
1 0 1 3
2 1 1 1
3 0 0 2
... ... ... ...
30 1 1 2

I am looking for ideas about a way that these data might be displayed so as to give a picture of set. If there were only two parameters, I could see how a heatmap might be used, but the introduction of the 3rd parameter makes this a challenge.

Any thoughts, greatly received - I'm not looking for solutions, but ideas to help me start.

Example dataset

llewmihs
  • 11
  • 3

1 Answers1

1

You basically have two issues:

  1. your data are categorical, and
  2. your data are multidimensional.

The basic plot for two categorical variables is a mosaic plot (with many levels in the variables, a correspondence analysis might be useful; see: Which is the best visualization for contingency tables? for discussions of both).

With multidimensional data a plot matrix (prototypically, a scatterplot matrix), or perhaps a coplot (see: First quick glance at a dataset). What we need is to adapt those plots for categorical data. First, we can just make a mosaic plot matrix. I'll demonstrate this using R:

library(vcd)  # we'll need this package
d = read.table(text="Group  P.status    S.Status    E.Status
                     A  1   2   0
                     ...
                     C  0   0   0", header=TRUE)  # read in the data
for(j in 1:4){  d[,j] = factor(d[,j])  };  rm(j)  # turn the variables into factors

windows()
  pairs(structable(d))

enter image description here

On the other hand, a coplot is just a series of 2D plots constructed using overlapping strata from a third (&/or more) variable. With categorical data, 'overlapping' doesn't make sense, but it's easy to just make two mosaic plots for S.Status vs. P.status at the two levels of E.Status:

windows(height=4, width=7)
  layout(matrix(1:2, nrow=1))
  mosaicplot(with(d[d$E.Status=="0",], table(S.Status, P.status)), shade=TRUE)
  mosaicplot(with(d[d$E.Status=="1",], table(S.Status, P.status)), shade=TRUE)

enter image description here

You could also try to make a Sankey plot as an adaptation of a parallel coordinates plot for categorical data, but the aforementioned would be my first choice. I demonstrate a simple one here: Chart suggestions for data flow.