Custom contrasts in R: why should I take the generalized inverse of transpose of the original contrast matrix?

Question

Let's say there is a continuous variable y and a grouping (factor) variable x with 3 different levels: a, b, and c.

x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)

I want to test two contrasts: (1) y for group a versus the mean of y for group b and c and (2) y for group a versus y for group c.

Below is how I do this:

contr_mat = matrix(c(1, -0.5, -0.5,
                     1, 0, -1),
                   nrow = 3, ncol = 2)
lm(y ~ x, contrasts = list(x = contr_mat)) %>% summary

However, according to here and here, the correct way to do this is:

library(MASS)
lm(y ~ x, contrasts = list(x = ginv(t(contr_mat)))) %>% summary

which produces different results in terms of both the regression coefficients and P values. But none of the above links explains "why" it is the correct way. The ?lm documentation also does not mention anything about ginv. Could someone explain why I should take the generalized inverse (ginv) of transpose (t) of the original contrast matrix to get the correct contrast matrix?

score 1 · Answer 1 · answered Oct 07 '22 at 19:00

What you want to specify as x in the list passed to the contrasts argument of lm is the matrix $X_\star$ which relates the regression coefficients $\left(\beta_0, \beta_1, \beta_2\right)^\top$ to the group means $\left(\mu_a, \mu_b, \mu_c\right)^\top$ via $$ \begin{pmatrix} \mathbf{1}_3 & X_\star \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} = \begin{pmatrix} \mu_a\\ \mu_b\\ \mu_c \end{pmatrix}. $$

To make $\left(\beta_1, \beta_2\right)^\top$ correspond to the two contrasts of the group means you want to test, $X_\star$ must be chosen in such a way that $$ \begin{pmatrix} \beta_1\\ \beta_2 \end{pmatrix} = C \begin{pmatrix} \mu_a\\ \mu_b\\ \mu_c \end{pmatrix} = C \begin{pmatrix} \mathbf{1}_3 & X_\star \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} $$ with the contrast matrix $$ C = \begin{pmatrix} 1 & -0.5 & -0.5\\ 1 & 0 & -1 \end{pmatrix} $$ holds.
If $X_\star=C^+$, where $C^+$ denotes the Moore–Penrose inverse of $C$ and $C$ has linearly independent rows (which it should have), then $C^+$ is a right inverse of $C$ and we have $$ C \begin{pmatrix} \mu_a\\ \mu_b\\ \mu_c \end{pmatrix} = C \begin{pmatrix} \mathbf{1}_3 & C^+ \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} = \begin{pmatrix} \mathbf{0}_2 & I_2 \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} = \begin{pmatrix} \beta_1\\ \beta_2 \end{pmatrix} $$
as desired.

In your $\mathsf{R}$ code contr_mat is the transpose of $C$. You can thus calculate x $\equiv X_\star \equiv C^+$ as the Moore–Penrose inverse of the transpose of contr_mat via x = MASS::ginv(t(contr_mat)).

user38 · Answer 2 · 2023-04-06T22:24:08.930

The contrast matrix you use is not square, it does not have a regular inverse. You need to use the Moore-Penrose (pseudo-)inverse of the contrast matrix, which is calculated by MASS::ginv().

Starting from you contrast matrix, what you want to calculate is:

\begin{equation} \begin{pmatrix} 1 & -0.5 & -0.5 \\ 1 & 0 & -1 \\ \end{pmatrix}\ \begin{pmatrix}\mu_{a} \\\mu_{b} \\\mu_{c}\end{pmatrix} = \begin{pmatrix} \mu_{a}-\frac{(\mu_{b}+\mu_{c})}{2} \\\mu_{a}-\mu_{c} \end{pmatrix} \end{equation}

Which is the comparisons you want, right? If this is the case, your contrast matrix is actually

t(contr_mat)
 [,1] [,2] [,3]

[1,]    1 -0.5 -0.5
[2,]    1  0.0 -1.0

So why do you need to take the inverse? Let's construct the inverse, with an added intercept:

C1 <- cbind(1, ginv(t(contr_mat)))
round(C1, 2)
 [,1]  [,2] [,3]

[1,]    1  0.67    0
[2,]    1 -1.33    1
[3,]    1  0.67   -1

If we invert C1 back, we get:

round(ginv(C1), 2)
 [,1]  [,2]  [,3]

[1,] 0.33  0.33  0.33
[2,] 1.00 -0.50 -0.50
[3,] 1.00  0.00 -1.00

Which is your original contrast matrix, with an added grand mean row:

\begin{equation} C\mu = \begin{pmatrix} 1/3 & 1/3 & 1/3 \\ 1 & -0.5 & -0.5 \\ 1 & 0 & -1 \\ \end{pmatrix}\ \begin{pmatrix} \mu_{a} \\\mu_{b} \\\mu_{c}\end{pmatrix} = \begin{pmatrix} \frac{\mu_{a}+\mu_{b}+\mu_{c}}{3} \\ \mu_{a}-\frac{(\mu_{b}+\mu_{c})}{2} \\ \mu_{a}-\mu_{c} \end{pmatrix} \end{equation}

So we took your original contrast matrix, transposed it, and took the pseudoinverse, added an intercept. If we invert this matrix, we recover your contrasts.

Why did we invert your matrix? Note that the matrix we built above (with C1 <- cbind(1, ginv(t(contr_mat))) ) is square and full rank: you can take the regular inverse of this matrix, just as for C. Let's then call it $C^{-1}$, because a property of a regular inverse is:

$C^{-1} C = CC^{-1} = I$, with $I$ the identity matrix.

The model supporting your comparisons and used with the lm() function can be expressed as:

$y = X\mu + \epsilon$

Where X is the design matrix, $\mu$ the vector of means, and $\epsilon$ the random component with normal distribution.

If X is the design matrix for the cell means model, lm() calculates the means using the least square equation:

$\hat{\mu} =(X^{\prime }X)^{-1}\ X^{\prime }y\\$

Where $X^{\prime}$ is the transpose of $X$. If we now modify the design matrix and coefficients to incorporate C, your contrast matrix, we get:

\begin{equation} y = X\mu + \epsilon = XI\mu + \epsilon = X(C^{-1}C)\mu + \epsilon = (XC^{-1})(C\mu) + \epsilon \end{equation}

So you can use the modified design matrix $(XC^{-1})$ to evaluate your contrasts $(C\mu)$, using the least square method, in the context of your original model.

If you use the modified design matrix $(XC^{-1})$, the least squares equations become: \begin{equation} \hat{C\mu} =[(XC^{-1})^{\prime }XC^{-1}]^{-1}\ (XC^{-1})^{\prime }y\\ \end{equation}

And you get your desired contrasts $C\hat{\mu}$. That's why you need to calculate the inverse of C, to evaluate the contrasts $C\mu$.

To show that it works in R with your example:

x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)
contr_mat = matrix(c(1, -0.5, -0.5,
                 1, 0, -1),
               nrow = 3, ncol = 2)

Let's build the modified design matrix:

X <- model.matrix(~x + 0)                # model matrix for the cell means model
X1 <- X %*% cbind(1,ginv(t(contr_mat)))  # this is the modified model matrix, using the pseudoinverse.

And we solve by the method of least squares or by lm(). lm() takes the inverse of your contrast matrix as argument and adds the intercept term, then evaluates the contrasts by the least squares method:

# least-squares equation:
solve ( t(X1)  %% X1 ) %% t(X1) %*% y
            [,1]
[1,] -0.06524626
[2,]  0.03627632
[3,]  0.04820965
lm() with modified design matrix
lm(y ~ X1 +0 ) %>% summary
Call:
lm(formula = y ~ X1 + 0)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.1953 -0.7073  0.0250  0.7382  2.7251
Coefficients:
    Estimate Std. Error t value Pr(>|t|)
X11 -0.06525    0.06049  -1.079    0.282
X12  0.03628    0.12596   0.288    0.774
X13  0.04821    0.14336   0.336    0.737
lm() with your contrasts
lm(y ~ x, contrasts = list(x = ginv(t(contr_mat)))) %>% summary
Call:
lm(formula = y ~ x, contrasts = list(x = ginv(t(contr_mat))))
Residuals:
    Min      1Q  Median      3Q     Max 
-3.1953 -0.7073  0.0250  0.7382  2.7251
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06525    0.06049  -1.079    0.282
x1           0.03628    0.12596   0.288    0.774
x2           0.04821    0.14336   0.336    0.737

The last two rows contain the comparisons you want:

means <- tapply(X = y, INDEX = factor(x), FUN = mean)
C %*% means
            [,1]
[1,] -0.06524626
[2,]  0.03627632
[3,]  0.04820965
(means[1] + means[2] + means[3])/3
-0.06524626 
means[1] - (means[2] + means[3])/2
0.03627632 
means[1] - means[3]
0.04820965

Custom contrasts in R: why should I take the generalized inverse of transpose of the original contrast matrix?

2 Answers2

lm() with modified design matrix

lm() with your contrasts

Linked