4

Let's say there is a continuous variable y and a grouping (factor) variable x with 3 different levels: a, b, and c.

x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)

I want to test two contrasts: (1) y for group a versus the mean of y for group b and c and (2) y for group a versus y for group c.

Below is how I do this:

contr_mat = matrix(c(1, -0.5, -0.5,
                     1, 0, -1),
                   nrow = 3, ncol = 2)

lm(y ~ x, contrasts = list(x = contr_mat)) %>% summary

However, according to here and here, the correct way to do this is:

library(MASS)
lm(y ~ x, contrasts = list(x = ginv(t(contr_mat)))) %>% summary

which produces different results in terms of both the regression coefficients and P values. But none of the above links explains "why" it is the correct way. The ?lm documentation also does not mention anything about ginv. Could someone explain why I should take the generalized inverse (ginv) of transpose (t) of the original contrast matrix to get the correct contrast matrix?

2 Answers2

1

What you want to specify as x in the list passed to the contrasts argument of lm is the matrix $X_\star$ which relates the regression coefficients $\left(\beta_0, \beta_1, \beta_2\right)^\top$ to the group means $\left(\mu_a, \mu_b, \mu_c\right)^\top$ via $$ \begin{pmatrix} \mathbf{1}_3 & X_\star \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} = \begin{pmatrix} \mu_a\\ \mu_b\\ \mu_c \end{pmatrix}. $$

To make $\left(\beta_1, \beta_2\right)^\top$ correspond to the two contrasts of the group means you want to test, $X_\star$ must be chosen in such a way that $$ \begin{pmatrix} \beta_1\\ \beta_2 \end{pmatrix} = C \begin{pmatrix} \mu_a\\ \mu_b\\ \mu_c \end{pmatrix} = C \begin{pmatrix} \mathbf{1}_3 & X_\star \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} $$ with the contrast matrix $$ C = \begin{pmatrix} 1 & -0.5 & -0.5\\ 1 & 0 & -1 \end{pmatrix} $$ holds.
If $X_\star=C^+$, where $C^+$ denotes the Moore–Penrose inverse of $C$ and $C$ has linearly independent rows (which it should have), then $C^+$ is a right inverse of $C$ and we have $$ C \begin{pmatrix} \mu_a\\ \mu_b\\ \mu_c \end{pmatrix} = C \begin{pmatrix} \mathbf{1}_3 & C^+ \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} = \begin{pmatrix} \mathbf{0}_2 & I_2 \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2 \end{pmatrix} = \begin{pmatrix} \beta_1\\ \beta_2 \end{pmatrix} $$
as desired.


In your $\mathsf{R}$ code contr_mat is the transpose of $C$. You can thus calculate x $\equiv X_\star \equiv C^+$ as the Moore–Penrose inverse of the transpose of contr_mat via x = MASS::ginv(t(contr_mat)).

statmerkur
  • 5,950
1

The contrast matrix you use is not square, it does not have a regular inverse. You need to use the Moore-Penrose (pseudo-)inverse of the contrast matrix, which is calculated by MASS::ginv().

Starting from you contrast matrix, what you want to calculate is:

\begin{equation} \begin{pmatrix} 1 & -0.5 & -0.5 \\ 1 & 0 & -1 \\ \end{pmatrix}\ \begin{pmatrix}\mu_{a} \\\mu_{b} \\\mu_{c}\end{pmatrix} = \begin{pmatrix} \mu_{a}-\frac{(\mu_{b}+\mu_{c})}{2} \\\mu_{a}-\mu_{c} \end{pmatrix} \end{equation}

Which is the comparisons you want, right? If this is the case, your contrast matrix is actually

t(contr_mat)
 [,1] [,2] [,3]

[1,] 1 -0.5 -0.5 [2,] 1 0.0 -1.0

So why do you need to take the inverse? Let's construct the inverse, with an added intercept:

C1 <- cbind(1, ginv(t(contr_mat)))
round(C1, 2)
 [,1]  [,2] [,3]

[1,] 1 0.67 0 [2,] 1 -1.33 1 [3,] 1 0.67 -1

If we invert C1 back, we get:

round(ginv(C1), 2)
 [,1]  [,2]  [,3]

[1,] 0.33 0.33 0.33 [2,] 1.00 -0.50 -0.50 [3,] 1.00 0.00 -1.00

Which is your original contrast matrix, with an added grand mean row:

\begin{equation} C\mu = \begin{pmatrix} 1/3 & 1/3 & 1/3 \\ 1 & -0.5 & -0.5 \\ 1 & 0 & -1 \\ \end{pmatrix}\ \begin{pmatrix} \mu_{a} \\\mu_{b} \\\mu_{c}\end{pmatrix} = \begin{pmatrix} \frac{\mu_{a}+\mu_{b}+\mu_{c}}{3} \\ \mu_{a}-\frac{(\mu_{b}+\mu_{c})}{2} \\ \mu_{a}-\mu_{c} \end{pmatrix} \end{equation}

So we took your original contrast matrix, transposed it, and took the pseudoinverse, added an intercept. If we invert this matrix, we recover your contrasts.

Why did we invert your matrix? Note that the matrix we built above (with C1 <- cbind(1, ginv(t(contr_mat))) ) is square and full rank: you can take the regular inverse of this matrix, just as for C. Let's then call it $C^{-1}$, because a property of a regular inverse is:

$C^{-1} C = CC^{-1} = I$, with $I$ the identity matrix.

The model supporting your comparisons and used with the lm() function can be expressed as:

$y = X\mu + \epsilon$

Where X is the design matrix, $\mu$ the vector of means, and $\epsilon$ the random component with normal distribution.

If X is the design matrix for the cell means model, lm() calculates the means using the least square equation:

$\hat{\mu} =(X^{\prime }X)^{-1}\ X^{\prime }y\\$

Where $X^{\prime}$ is the transpose of $X$. If we now modify the design matrix and coefficients to incorporate C, your contrast matrix, we get:

\begin{equation} y = X\mu + \epsilon = XI\mu + \epsilon = X(C^{-1}C)\mu + \epsilon = (XC^{-1})(C\mu) + \epsilon \end{equation}

So you can use the modified design matrix $(XC^{-1})$ to evaluate your contrasts $(C\mu)$, using the least square method, in the context of your original model.

If you use the modified design matrix $(XC^{-1})$, the least squares equations become: \begin{equation} \hat{C\mu} =[(XC^{-1})^{\prime }XC^{-1}]^{-1}\ (XC^{-1})^{\prime }y\\ \end{equation}

And you get your desired contrasts $C\hat{\mu}$. That's why you need to calculate the inverse of C, to evaluate the contrasts $C\mu$.

To show that it works in R with your example:

x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)
contr_mat = matrix(c(1, -0.5, -0.5,
                 1, 0, -1),
               nrow = 3, ncol = 2)

Let's build the modified design matrix:

X <- model.matrix(~x + 0)                # model matrix for the cell means model
X1 <- X %*% cbind(1,ginv(t(contr_mat)))  # this is the modified model matrix, using the pseudoinverse.

And we solve by the method of least squares or by lm(). lm() takes the inverse of your contrast matrix as argument and adds the intercept term, then evaluates the contrasts by the least squares method:

# least-squares equation:

solve ( t(X1) %% X1 ) %% t(X1) %*% y [,1] [1,] -0.06524626 [2,] 0.03627632 [3,] 0.04820965

lm() with modified design matrix

lm(y ~ X1 +0 ) %>% summary

Call: lm(formula = y ~ X1 + 0)

Residuals: Min 1Q Median 3Q Max -3.1953 -0.7073 0.0250 0.7382 2.7251

Coefficients: Estimate Std. Error t value Pr(>|t|) X11 -0.06525 0.06049 -1.079 0.282 X12 0.03628 0.12596 0.288 0.774 X13 0.04821 0.14336 0.336 0.737

lm() with your contrasts

lm(y ~ x, contrasts = list(x = ginv(t(contr_mat)))) %>% summary

Call: lm(formula = y ~ x, contrasts = list(x = ginv(t(contr_mat))))

Residuals: Min 1Q Median 3Q Max -3.1953 -0.7073 0.0250 0.7382 2.7251

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.06525 0.06049 -1.079 0.282 x1 0.03628 0.12596 0.288 0.774 x2 0.04821 0.14336 0.336 0.737

The last two rows contain the comparisons you want:

means <- tapply(X = y, INDEX = factor(x), FUN = mean)
C %*% means
            [,1]
[1,] -0.06524626
[2,]  0.03627632
[3,]  0.04820965

(means[1] + means[2] + means[3])/3 -0.06524626 means[1] - (means[2] + means[3])/2 0.03627632 means[1] - means[3] 0.04820965

user38
  • 33