Compositional Data in R

Question

I'm writing a work on the Aitchison geometry for compositional data and I have seen an Image I want to reproduce in R.

I work with the "compositions" library and I want to understand how to plot the grid they have used in the image below after centering the data.
Here is my code:

library(compositions)
sigma = 1
mu = c(0,0)
mymn3 <- ilrInv(c(2,0))
myvr3 <- ilrvar2clr(sigma*diag(1, 2,2))
xr= rnorm.acomp(n=50, mean=mymn3, var=myvr3)
xr_center = scale(xr,center=TRUE,scale=FALSE)
par(mfrow=c(1,2))
plot(mymn3,pch=".")
plot(mean.acomp(xr),pch=17)
plot(xr, add= TRUE, pch=19, cex=0.5)
plot(acomp(sa.lognormals))
plot(mymn3,pch=".")
plot(xr_center, add= TRUE, pch=19, cex=0.5)

In this code I define a variable $xr$ that generates a sample of compositions following a normal distribution. $xr_center$ translates this sample, that situates in the corner of the simplex, to the center of the simplex. This is the image I got How can I introduce now the grid you see in the picture above?

source of the image: STATISTISCHE DISKUSSIONSBEITRÄGE

whuber · Answer 1 · 2022-05-31T21:46:59.607

This image represents a projective transformation of the simplex (a relatively simple non-linear bivariate transformation). Apparently its purpose is to center the data on the simplex.

Note well that it is not any version of the ILR (isometric log ratio) transformation. The ILR can place the original points far outside any fixed simplex and it will not map the line segments of a grid into line segments: the images would be curved.

I will give a brief analysis and example.

A projective transformation of the plane is the one used by painters to establish realistic perspective. This means, in particular, that it transforms any line into another line (albeit situated arbitrarily far away "at infinity.") It follows that the most general projective transformation takes the form

$$(x,y) \to \left(\frac{ax + by + c}{dx + ey + f}, \frac{a^\prime x + b^\prime y + c^\prime}{dx + ey + f}\right)$$

where $a,b,\ldots, e, f$ are nine nearly arbitrary coefficients defined up to a common factor. (There's a mild restriction imposed by the need for the transformation to be invertible.)

The transformations that preserve the interior of a triangle will permute its vertices. If we also insist they fix the vertices, then each vertex imposes two linear constraints on the coefficients, leaving a family of projective transformations with at least $9-1 - 3(2) = 2$ parameters that fix the triangle. It turns out this family is exactly two-dimensional (the three sets of conditions are independent).

Because all triangles are projectively equivalent, I used a triangle with simple vertex coordinates to find this family: namely, the one with vertices at $(\pm 1,0)$ and $(0,1).$ In this case the transformations are of the form

$$(x,y) \to \left(\frac{x + b(y-1)}{-bx + ey + 1}, \frac{(e+1)y}{-bx + ey + 1}\right)$$

with $b\ne\pm 1$ and $e\ne 1.$ Necessarily $b^2\lt 1$ and $e \gt -1$ so that the interior of the triangle stays inside.

Here is a rogue's gallery of such transformations. The columns show images of the triangle under transformations with $b=-1/2,0,1/4,$ and $3/4.$ The rows vary $e$ from $-1/2$ to $0$ to $2.$ The identity transformation, $b=0$ and $e=0,$ appears second from left in the middle row, showing the original triangular grid. All the other grids were drawn by transforming this grid. (Each line segment in the grid has endpoints along the triangle's boundary. The image of such a segment is found by connecting the transformed endpoints.)

The triangle shown in this figure is drawn accurately: its vertices are at $(\pm1,0)$ and $(0,1).$ If you get the impression you are looking at the same gridded triangle from different vantage points above the plane of the image, you will be correct: that's precisely what a projective transformation does.

Now, given data $(x_i,y_i)$ constrained to lie inside this triangle, we can find $b$ and $e$ for which the mean of the transformed values is at its barycenter $(0,1/3).$ This is a straightforward nonlinear optimization problem that needs to be numerically solved. The illustration in the question corresponds closely with the coefficients $b=1/2$ and $e=3/2.$

The simplex with vertices $(\pm \sqrt{1/2},0)$ and $(0,1)$ is obtained easily by scaling the original triangle horizontally by a factor of $\sqrt{1/2}$ Up to an isothety (uniform change of scale), this creates geometrically accurate plots. If you must, you can further shrink this simplex uniformly by a factor of $\sqrt{1/2}$ to make it a unit simplex. For visualization this last step doesn't matter.

Finally, starting with 3D compositional data you need to project them isometrically into the plane. This is described in your reference and more details can be found here on CV at https://stats.stackexchange.com/a/259223/919.

As an example, I generated $100$ data points in a simplex and plotted them as shown at left below. The projective transformation preserving the triangle's vertices and interior that places the data means at the barycenter is applied to all elements of this image, as shown at right.

This R code supplies all the computational and graphical details.

#
# Projective transformations preserving the triangle (1,0), (0,1), (-1,0).
# Returns `A`, the 3 X 3 matrix with rows (a,b,c), (a',b',c'), and (d,e,1).
#
xform <- function(b, e) matrix(c(1,0,-b ,b,e+1,e, -b,0,1), 3) # Want |b| < 1, e > -1
#
# Apply the projective transformation represented by `A` to the rows of `x`.
#
f <- function(x, A) {
  B <- A %*% t(cbind(x, 1))
  y <- t(apply(B, 2, function(y) if(y[3] != 0) y[1:2] / y[3] else c(Inf, Inf)))
}
#
# Create some random points in a simplex to serve as example data.
#
N <- 100
set.seed(17)
p <- apply(matrix(rnorm(2*N, c(0.2, 0.45), 0.1) %% 1, 2), 2,
           function(x) c(sqrt(1/2), 1) * 
                       (if(x[2] > x[1]) c(1-x[2], x[1]) else c(x[1]-1,x[2])))
pts <- t(p)
#
# Reproduce the illustration in the question.
#
x.bar <- mean(pts[,1])
y.bar <- mean(pts[,2])
#
# Find transformation parameters that place the point of averages at (0, 1/3).
#
b <- x.bar / (1 - y.bar)                   # Approximate solution, as a start value
e <- (1 - b * x.bar) / (2 * y.bar) - 3/2   # Approximate solution, as a start value
be <- nlm(function(be) 
            sum((colMeans(f(pts, xform(be[1], be[2])))-c(0, 1/3))^2), c(b, e))$estimate
b <- be[1]
e <- be[2]
A <- xform(b, e)
#
# Three sets of parallel line segments in the triangle.
# `x` contains their origins while `x.` contains their destinations.
#
n <- 7 `n-2` line segments will appear in the triangular grid
x <- rbind(
  cbind(seq(-1, 1, length.out=n), 0),
  cbind(seq(-1, 0, length.out=n), seq(0, 1, length.out=n)),
  cbind(seq(1, -1, length.out=n), 0))
x. <- rbind(
  cbind(seq(0, 1, length.out=n), seq(1, 0, length.out=n)),
  cbind(seq(1, 0, length.out=n), seq(0, 1, length.out=n)),
  cbind(seq(0, -1, length.out=n), seq(1, 0, length.out=n)))
#
# Show the triangle, squeezed by sqrt(1/2) horizontally to be equilateral, before
# and after transformation.
#
par(mfrow=c(1,2))
cols <- rainbow(3, .8, .8)
for (A in list(diag(1,3), A)) {
  # 
  # Set up a plot showing the interior of the triangle.
  #
  plot(c(-1,1) * sqrt(1/2), c(0,1), type="n", bty="n", asp=1, 
       xaxt="n", yaxt="n", xlab="", ylab="")
  polygon(c(-1,1,0,-1) * sqrt(1/2), c(0,0,1,0), col=gray(.975), border=NA)
  #
  # Transform the line segments.
  #
  y <- f(x, A) %*% diag(c(sqrt(1/2), 1))
  y. <- f(x., A) %*% diag(c(sqrt(1/2), 1))
  #
  # Plot the segments, the points, and a boundary triangle.
  #
  sapply(seq_len(nrow(y)), function(i) lines(c(y[i,1], y.[i,1]), c(y[i,2], y.[i,2]), 
                                             col=cols[ceiling(i/n)]))
  points(f(pts, A) %*% diag(c(sqrt(1/2), 1)), bg=gray(0, .1), pch=21)
  lines(c(-1,1,0,-1) * sqrt(1/2), c(0,0,1,0), lwd=2)
}
par(mfrow=c(1,1))

Compositional Data in R

1 Answers1