2

I am running a regression where I am trying to identify the influence of the proximity to Tuskegee, Macon County and the percentage of the African-American population on the COVID-19 vaccination rate on county level.

reg vaccination_perc_points c.dist_Tuskegee##c.perc_points_afri_am i.urban_rual i.education_low_high median_income perc_points_male social_capital_index perc_points_am_indian_alaka_native perc_points_asian, robust

Now I would like to apply state-level effects. I am wondering whether I should apply random or fixed effects. I control for all possible realizations of states, which would indicate fixed effects. But I am only interested in a fraction of my population (African-American) which favors, as far as I understood, random effects.

  • 2
    I would have thought random. Otherwise you are estimating 50 parameters in which you have no interest. – mdewey Nov 27 '21 at 13:42
  • One suggestion from the literature is to choose random effects based on dispersion, high variances suggest random effects. – user78229 Feb 13 '24 at 01:25

3 Answers3

2

Random effects are best used when you are trying to summarize a lot of variation between groups, people, or any other cluster of data (see lengthier discussion here). In this case, 50 clusters (states) is a lot of groups to consider, and is not likely your primary theoretical fixed effect of interest. Nonetheless, I believe your assessment is correct that the states can be used as random effects.

If you are specifically curious about how each state varies but don't want to enter them as fixed effects, you can use caterpillar plots (shown in the previously linked answer) to show how much each state varies by 1) average increase/decrease in the dependent variable or 2) variation in slopes.

1

Individual attributes are usually modeled as random effects.

Random effects involve pooling from other categories, assuming the samples from each category come from the same higher level population and are somewhat similar. It works well even when each category has small samples, cause it uses data in other categories as well.

See: What is the difference between fixed effect, random effect and mixed effect models?

  • Individual attributes are usually modeled as random effects.---This is too specific to subject-level research and can probably be lumped into just cluster-level effects, which you hint at with your second paragraph. – Shawn Hemelstrand Feb 13 '24 at 01:11
-1

Random effects plus a geographic control (e.g. a Gaussian process).

The answer to “fixed or random effects” is pretty much “always use random effects.” If you have more than 2 variables, fixed effects are inadmissible, i.e. always worse (by Stein’s estimator).

There’s really no good reason to use fixed effects when you have the option of random effects, unless your goal is to save effort or computation.

  • There’s really no good reason to use fixed effects for anything, except to save effort. I don't understand this comment. Can you elaborate? Why would anybody use regression if fixed effects are never useful? – Shawn Hemelstrand Feb 13 '24 at 01:09
  • @ShawnHemelstrand Random effects regression is a kind of regression. – Closed Limelike Curves Feb 17 '24 at 04:53
  • My point isn't that fixed effects are never useful--they're definitely an improvement on no effects! But Stein's phenomenon shows that the random effects estimator will always have a smaller mean squared error than the fixed effects estimator. – Closed Limelike Curves Feb 17 '24 at 04:57
  • The mean squared error shouldn't be the only criterion for selecting fixed and random effects. If MSE was the only reason to run models, then overfitting regressions would be the norm in statistics. – Shawn Hemelstrand Feb 17 '24 at 05:38
  • @ShawnHemelstrand I mean out of sample MSE, or MSE of regression coefficients, not MSE of in-sample predictions. I suggest googling “Stein’s estimator.” – Closed Limelike Curves Feb 20 '24 at 07:00
  • It also doesn’t really matter what loss function you pick. Asymptotically, no regularization always loses to regularized estimation. – Closed Limelike Curves Feb 20 '24 at 07:04