1

I'm trying to understand why the values under 'estimate' from an emmeans contrast function differ from those of the default 'Estimate' values from, say, 'summary.lm()' in R. As an example, let's use Helmert contrasts (sometimes called 'reverse Helmert', but the R way of doing these suits my purpose) on a subset of the 'warpbreaks' data set.

Default R output

> data("warpbreaks")
> contrasts(warpbreaks$tension) <- contr.helmert(3)
> wbA <- subset(warpbreaks, wool == "A")
> wbA.lm1 <- lm(breaks ~ tension, data = wbA)

And the coefficients give me the differences that I'd expect based on the cell means

> coef(wbA.lm1)
(Intercept)    tension1    tension2 
  31.037037  -10.277778   -3.240741 

For the record the cell means are:

       L        M        H 
44.55556 24.00000 24.55556 

So that, for example, 'tension1' is the mean of the first two means minus the mean of the first level.

emmeans estimates

Function to compute (reverse) Helmert:

helmert.emmc <- function(levs, ...) {
  M <- as.data.frame(contr.helmert(levs))
  names(M) <- paste(levs[-1],"vs earlier")
  attr(M, "desc") <- "Helmert contrasts"
  M
}

And applying this to our model's emmeans:

> wbA.lm1.emm <- emmeans(wbA.lm1, ~ tension)
> contrast(wbA.lm1.emm, "helmert")
 contrast     estimate    SE df t.ratio p.value
 M vs earlier    -20.6  6.13 24  -3.351  0.0027
 H vs earlier    -19.4 10.63 24  -1.830  0.0797

So the 'estimate' for 'M vs earlier' is twice the value given by the default coef(), and 'H vs earlier' is six times the value given by coef().

I can't help feeling that I'm close to understanding why this happens from this post, but the final answer eludes me. So apologies in advance if the respondents to that post have answered this.

Regards,

Leon Barmuta

LimnoLeon
  • 113

1 Answers1

1

The post you link does indeed try to explain the same thing.

The basic point is this: A regression model relates the means $\mu$ to the regression coefficients $\beta$, via a matrix equation $\mu = X\beta$. This says that each $\mu_i$ is a linear function $\sum x_{ij}\beta_j$. Note that:

  1. The contrast codings contr.helmert and such give you the $x_{ij}$ values. Those are weights that are applied to the $\beta_j$.

  2. When you want to estimate a contrast of means, you are looking at something like $\sum c_i\mu_i$. Those $c_i$ are weights that are applied to the $\mu_i$ values.

The bottom line: Weights applied to the $\beta$s are not to be confused with weights applied to the $\mu$s.

Russ Lenth
  • 20,271
  • Excellent and thanks for your patience. Others still confused by this issue I found the vignette from Bill Venables's "codingMatrices" package (whcih I have only just discovered) really helped clarify the difference between coding matrices and contrast matrices. – LimnoLeon May 20 '22 at 00:45
  • I simply don't pay attention to the regression coefficients at all, unless perhaps in an ordinary multiple regression situation with no factors. I think packages like my emmeans simply make it unnecessary to try to interpret regression coefficients. What matters is what the model predicts, and how much some predictions differ from other predictions. – Russ Lenth May 20 '22 at 01:09