Firth's regression with many fixed effects

Question

I've got a sample (n=7000) with a binary dependent variable with many 0s and only a few 1s (150 total) and therefore want to run a Firth's biased reduced regression to estimate the probability of the 1s. So far so good, but my explanatory variables are fixed effects for the individuals (=their individual names). Each individual has between 2 and 50 cases in the data, and all in all there are about 120 individuals. An individual's covariate values can change from trial to trial. As I want to include the individuals as fixed effects I convert them as Dummies. Running Firth's regression with the R logistf package produces several errors now, because of the big number of 'IndividualDummies'. Is there a way to include a big amount of dummies in another way like (1|Individual) in glmer package? Would be great to make it more handy for Firth's regression.

Do you have multiple observations/trials per individual? If so, about how many? What's the total number of 1s and 0s? Do covariates for individuals differ from trial to trial? I ask because I wonder whether the Firth penalization is really necessary and if this might be better handled by a standard generalized linear mixed model. Please provide that information by editing the question, as comments are easy to overlook and can be deleted. — EdM, Mar 18 '22 at 18:29
Sorry for submitting a new answer, but i'm not able to comment since don't have enough reputation yet. Thanks for your answer. 1.) Exactly i've got between 2 and 50 obs ( at least 2 observations per individual and up to 50 observations for some few individuals) 2.) Total number of observations is about 7000 and the 1s are 150, so nearly just 2%. 3.) Yes, covariates do differ from trial to trial in some times. Basic idea is to evaluate performance of track and field athletes so fixed effects are the individual name, the discipline, and the tournament and some controls relating to daily circumst — Norbi, Mar 18 '22 at 20:07

score 3 · Answer 1 · answered Mar 18 '22 at 16:13

I am not aware of any generalization of the Firth's penalized likelihood approach to mixed effect models (i.e. not clear how one ought to penalize the hierarchical scale parameter in this setting).

However, note that Firth's penalized likelihood approach is basically a Bayesian maximum a-posteriori approach using Jeffreys prior. Thus, one thing you could do is to use software that's intended for (amongst other things) fitting Bayesian generalized linear mixed effects models. There is a number of convenient R packages like brms (or also rstanarm) that would easily allow you to do something like that using familiar R formula syntax. You'd simply have to decide on appropriate prior distributions, for which you might e.g. choose something rather weakly informative in order to obtain something that's not too different from the Firth approach in simple settings (assuming that is desirable to you).

In general, Bayesian approaches with proper priors are pretty good (i.e. they often have better frequentist operating characteristics for plausible scenarios than competing frequentist methods) at dealing with sparse binary/binomial data, if you set sensible prior distributions.

thanks for your answer! Could be a possible way to run it with brms. Actually tried with 'logistf' and had the problems. — Norbi, Mar 18 '22 at 17:24

score 1 · Answer 2 · answered Mar 19 '22 at 14:36

The big problem here is the small number of events per predictor, as you want to include the individuals as fixed effects. It's not clear that the Firth penalization is the best solution to that problem.

To avoid overfitting you typically need about 10-20 cases in the minority class (events) per predictor in the model. With only 150 events and 120 individuals treated as fixed effects, plus other covariates, you are approaching just 1 event per predictor. Some type of penalization is called for, but it's not clear that Firth's is the best choice.

First, the original Firth method penalizes both the regression coefficients and the intercept toward values of 0. As it reduces small-sample bias in predictor coefficients it thus also biases the intercept toward 0 so that probability predictions are biased toward 0.5. The logistf package now provides modifications that help avoid that problem.

Second, penalization is tricky with dummy variables.* As Frank Harrell explains in Section 9.10 of the second edition of Regression Modeling Strategies, with penalization "one will get different predictions depending on which cell is chosen as the reference cell when constructing dummy variables." For a categorical predictor having multiple levels, he recommends penalizing the sum of squared differences of the individual regression coefficients from their mean. The lrm() and pentrace() functions in his rms package can implement that approach in the broader context of penalized maximum likelihood.

Another choice would be a simple mixed-effect model, with the individuals treated as random effects and fixed effects limited to the covariates. In that case, the much higher event/predictor ratio might get around the small-sample coefficient bias for covariates adequately for your purposes. Random intercepts for the individuals would represent their differences from the log-odds at baseline covariate values and thus still provide some basis for examining differences among them.

If you are intent on treating the individuals as fixed effects, I suppose that Björn's suggestion to use a Bayesian approach (+1) could be extended to a fixed-effects model.

Finally, I'd suggest that you read this related thread. The answer from StasK shows how to treat clusters (individuals in your case) as fixed effects with a Firth regression.* If you want to use a Bayesian approach in a mixed model, the answer from Ben Bolker illustrates how to use the blme package, a Bayesian-modeling wrapper for lme4, to use a weak prior on the fixed effects to accomplish something similar to the Jeffreys prior that corresponds to Firth penalization.

*Did you set up the dummies yourself, or did you set up a single Individual categorical predictor including all individuals and let R define the corresponding dummies? The latter is much less error-prone. Errors in defining dummy variables can lead to error messages when you try to fit a model.

Firth's regression with many fixed effects

2 Answers2

Linked