Why do we use n-1 effect codes and -1 values in sum (effect) coding in linear regression? Why can't we treat all the effects equally?

Asked Mar 29 '24 at 23:53

Active Mar 29 '24 at 23:54

Viewed 10 times

When doing effect/sum coding in linear regression (which AFAIK are the same), the contrasts are coded as:

	condition1	condition2	condition3
condition1	1	0	0
condition2	0	1	0
condition3	0	0	1
condition4	-1	-1	-1

Then, in the output of the model in e.g. R, we only get confidence intervals and p-values for conditions 1-3, and if we want a CI for condition 4, we have to run the model again with a different condition taking the -1 row.

My question is, why can't we just have something like:

	condition1	condition2	condition3	condition4
condition1	1	0	0	0
condition2	0	1	0	0
condition3	0	0	1	0
condition4	0	0	0	1

If we're able to get a CI for all four conditions' difference from the grand mean using two models, why can't we just make a model that does that the first time? Why do we have to treat one condition differently from the rest, when effect coding is supposedly a symmetrical model (unlike dummy coding)?

edited Mar 29 '24 at 23:54

asked Mar 29 '24 at 23:53

edetone

Why do we use n-1 effect codes and -1 values in sum (effect) coding in linear regression? Why can't we treat all the effects equally?

0 Answers0