1

We are given a compositional data set, where the response is $$Y = [y_1, ..., y_n], \sum y_i = 1, y_i \in [0,1]$$

I intend to do regression, however, prior to that, I would like to get a feel of the codependence structure of $Y$.

What is the right way to do this?

In particular, I think the usual "scatterplot" approach will not work: If I plot $y_1$ against $y_2$, I may see a positive curve, but this could be artificial, due to the sum condition.

I could apply log-ratios to the data, but then how do I interpret the scatterplots of the log-ratios?

What about biplots?

doso
  • 11

1 Answers1

0

The short answer is that there is no great way to do this. Inherently the "codependence structure of Y" is obscured by the fact that you have compositional measurements.

That said, a Compositional Biplot see section 5.4 here is a great "go-to" tool for learning about the structure and, under some circumstances, the relationships between variables in your dataset. Note that the there are a number of R packages that will help you build a compositional biplot. Check out the package "compositions" on CRAN.

As GeoMatt22 suggests, you could also just calculate the covariance or variation array and plot the data in either a log-ratio basis or in a ternary diagram... but I think this might be more difficult (both to compute and to interpret) than the compositional biplot where "rays" and "links" (see the lecture notes I linked to above) have inherent and important meanings.

jds
  • 978