I'm trying to understand why the values under 'estimate' from an emmeans contrast function differ from those of the default 'Estimate' values from, say, 'summary.lm()' in R. As an example, let's use Helmert contrasts (sometimes called 'reverse Helmert', but the R way of doing these suits my purpose) on a subset of the 'warpbreaks' data set.
Default R output
> data("warpbreaks")
> contrasts(warpbreaks$tension) <- contr.helmert(3)
> wbA <- subset(warpbreaks, wool == "A")
> wbA.lm1 <- lm(breaks ~ tension, data = wbA)
And the coefficients give me the differences that I'd expect based on the cell means
> coef(wbA.lm1)
(Intercept) tension1 tension2
31.037037 -10.277778 -3.240741
For the record the cell means are:
L M H
44.55556 24.00000 24.55556
So that, for example, 'tension1' is the mean of the first two means minus the mean of the first level.
emmeans estimates
Function to compute (reverse) Helmert:
helmert.emmc <- function(levs, ...) {
M <- as.data.frame(contr.helmert(levs))
names(M) <- paste(levs[-1],"vs earlier")
attr(M, "desc") <- "Helmert contrasts"
M
}
And applying this to our model's emmeans:
> wbA.lm1.emm <- emmeans(wbA.lm1, ~ tension)
> contrast(wbA.lm1.emm, "helmert")
contrast estimate SE df t.ratio p.value
M vs earlier -20.6 6.13 24 -3.351 0.0027
H vs earlier -19.4 10.63 24 -1.830 0.0797
So the 'estimate' for 'M vs earlier' is twice the value given by the default coef(), and 'H vs earlier' is six times the value given by coef().
I can't help feeling that I'm close to understanding why this happens from this post, but the final answer eludes me. So apologies in advance if the respondents to that post have answered this.
Regards,
Leon Barmuta