5

This R code outputs the eta squared from an ANOVA:

y     <- c(rnorm(30, 3), rnorm(30, 4), rnorm(30, 5))
x     <- sort(rep(paste("treatment", 1:3), 30))
xy    <- data.frame(x,y)
xyaov <- aov(y ~ x, xy)

library(heplots)
etasq(xyaov)

          Partial eta^2
x             0.4807356
Residuals            NA

How to write a code to calculate the eta squared "by hand", i.e. without using the pre-prepared etasq function?

amoeba
  • 104,745
luciano
  • 14,269

2 Answers2

14

It rather depends on what you mean by "by hand".

There is more than one way to do it. You can use the residuals:

> etasq(xyaov)
          Partial eta^2
x             0.4854899
Residuals            NA
> 1 - var(xyaov$residuals)/var(y)
[1] 0.4854899

(You didn't set a seed, so we don't have exactly the same result).

Almost equivalently, you can use the predicted values:

> var(predict(xyaov)) / var(y)
[1] 0.4854899

You can use the sums of squares from the ANOVA model (which is given by the rather unintuitive):

 > summary(xyaov)[[1]][[2]][[1]] / (summary(xyaov)[[1]][[2]][[2]] + summary(xyaov)[[1]][[2]][[1]] )
[1] 0.4854899

You can use summary.lm and get the R^2 (because R-squared is eta squared):

> summary.lm(xyaov)$r.squared
[1] 0.4854899

You can do it with no reference to the aov() function by calculating the mean for each group, then the residual, then eta squared based on that:

xy <- as.data.frame(cbind(x, y))
xy$y <- as.numeric(as.character(xy$y))  #I don't understand why this line is needed
x.means <- as.data.frame(tapply(y, x, mean))
x.means$x <- row.names(x.means)
    xy <- merge(x.means, xy, by="x")
    xy$resid <- xy[, 2] - xy$y
    1 - var(xy$resid) / var(xy$y)
[1] 0.4854899
Jeremy Miles
  • 17,812
11

eta-squared ($\eta^2$), is a measure of effect size for ANOVA models that is analogous to $R^2$. That is, it gives the proportion of the variability in $Y$ that can be accounted for by knowledge of $X$. There is a 'regular' $\eta^2$, and a partial $\eta^2$. This distinction only comes into play when you have an ANOVA with multiple factors. Here are the formulas:
\begin{align} \eta^2_\text{(regular)} &= \frac{SS_\text{between}}{SS_\text{total}} \\[10pt] \eta^2_\text{partial} &= \frac{SS_\text{factor}}{SS_\text{factor} + SS_\text{error}} \end{align} For the latter, only a specific factor is implied, and the sums of squares associated with other factors in the model would not enter into the calculation.

For your example, the top formula would be used:

set.seed(55)
y     <- c(rnorm(30, 3), rnorm(30, 4), rnorm(30, 5))
x     <- sort(rep(paste("treatment", 1:3), 30))
xy    <- data.frame(x,y)
xyaov <- aov(y ~ x, xy)

anova(xyaov)
# Analysis of Variance Table
# 
# Response: y
#           Df Sum Sq Mean Sq F value   Pr(>F)    
# x          2 62.808  31.404  33.622 1.52e-11 ***
# Residuals 87 81.260   0.934                     
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

anova(xyaov)[1,2]/sum(anova(xyaov)[,2])
[1] 0.435961