2

I'm a little fuzzy on the exact assumptions needed for mixed/fixed effects models. As an example, let's say we're trying to model the effect of age on a person's 5k time, and we have a dataset of race times by person, by year (so multiple observations per person at different ages).

From what I remember, the naive OLS way would be to regress race time on age. We don't want to do this because there are multiple observations per person, which violates the OLS assumption of independent observations. We can introduce a random effect for person to the model, which "allows" for each person to have their own intercept and lets us see the within-subject effect of age on race time. I believe this is a pretty standard way to deal with multiple observations per subject.

However, what's the difference between:

  1. Using a mixed-effects model as specified, and
  2. Using a fixed-effects model but using dummy variables for each person? In essence, why can't we just regress race time on age + Person A + Person B + ..., where Person [x] is a dummy for a particular person in the data? Isn't this also effectively allowing each person to have their own intercept?

2 Answers2

1

The difference is, that in the random-effects model, you will have a shrinkage effect: While in the fixed-effects model you get offsets that give the best fit for each person, independent of the offsets of the others, the random-effects model also tries to make those offsets similar. The idea is that the offset of one person is somehow an indicator for the offset of another person to be similar. Because, they are both, after all, "persons". This especially helps e.g. if you have persons with lots of data, where you are quite sure about the fitted offsets, and other persons with only very little data which leads to large uncertainty about the offsets. In this case, the requirement that the offsets of all the persons are similar will draw them together ("shrinking" them together), and draw the uncertain offsets towards the certain offsets.

frank
  • 10,797
0

Using dummy variables for each person is called the LSDV approach. If the variables stay the same, the coefficient for "age" will be the same for both LSDV and Fixed effects even though the estimation approach is different.

Fixed effects/LSDV are used when the individual effects are not random but are correlated with the independent variables. Conversely, random effects assume that there is no such correlation. This forms the basis of the Hausman test as well. You can use the Hausman Test to determine whether to apply fixed or random effects.

This can help you choose the appropriate approach between fixed and random effects.

As for the Fixed effects, LSDV or dummy variable approach, there are some things to keep in mind when using these. First, these models don't allow you to include any time-invariant independent variables. For example, you can't estimate the effects of gender, race etc. If you are including only age as the independent variable, you don't need to worry about this. But I would still suggest you include some control variables that may affect the 'race time'.

Secondly, the dummy variables take away degrees of freedom because you are estimating more coefficients (one for each individual).

  • I'd just like to point out that using a test (e.g. Hausman) to decide on your model is not considered good practice because it creates problems with the validity of inference down the line. – Stefan Jul 19 '22 at 06:11
  • @Stefan Can you please elaborate more on your point, how does it create problems? Maybe with an example. Thank you for your input. – Spur Economics Jul 19 '22 at 07:44
  • The problem is that using the same data for model selection and inference creates problems of data dredging (see e.g. https://stats.stackexchange.com/a/20856/200803), a keyword here is inference after model selection. – Stefan Jul 19 '22 at 08:08