Why is it preferable to test the effect of a predictor using a likelihood ratio test?

Question

In their fantastic book Applied Longitudinal Data Analysis: Modeling Change and event Occurrence Singer and Willett advocate a iterative model comparison technique for testing the effect of predictors: starting with a baseline model and then adding predictors, one at a time, to a series of subsequent models, using a likelihood ratio test to determine whether the addition of the new predictor to the model significantly reduces the -2LL deviance. These are nested models so each LR test in effect tests the effect of each new predictor. If the -2LL deviance is significantly reduced the new model retained, if not the previous model is retained.

They say they recommend this technique over simply including all the predictors one intends to test in a single regression model and reporting the confidence intervals and p-values.

I have accepted their recommendation however I don't really understand why (i.e. on what grounds) they consider their iterative approach superior to the 'all-in' approach.

The reason I ask is that I have conducted analyses for several papers now where my co-authors have been very confused by the iterative approach. It could be I am not explaining it well enough but I would like some more solid justification for using it than 'because Singer and Willett said so'.

A plain-language explanation would be very much appreciated.

This sounds like stepwise regression, which can be useful but requires more delicate handling than your description of the text suggests is present in the book. (This is to say that my impression of that book, based on what is written here (I haven’t read it), is that it is not correct.) $//$ Do you mean to ask about a likelihood ratio test as opposed to, say, a Wald or score test? — Dave, Mar 25 '24 at 01:37
Thanks for responding Dave, My understanding is that this is not the same thing stepwise regression, it is testing difference in -2LL deviance of nested models. Have added a few more details to make this more clear. — llewmills, Mar 25 '24 at 03:33

score 1 · Answer 1 · answered Mar 25 '24 at 04:02

These are nested models so each LR test in effect tests the effect of each new predictor.

If you want to define stepwise regression as using information criteria instead of p-values to decide which variables should be kept, fine, but this is following the spirit of such stepwise regression and carries the same issues listed by Frank Harrell.

It yields R-squared values that are badly biased to be high.
The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution.
The method yields confidence intervals for effects and predicted values that are falsely narrow; see Altman and Andersen (1989).
It yields p-values that do not have the proper meaning, and the proper correction for them is a difficult problem.
It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani [1996]).
It has severe problems in the presence of collinearity.
It is based on methods (e.g., F tests for nested models) that were intended to be used to test prespecified hypotheses.
Increasing the sample size does not help very much; see Derksen and Keselman (1992).
It allows us to not think about the problem.
It uses a lot of paper.

Issues 1, 2, 3, 4, and 7 are the most problematic in my mind. The issues center on the fact that stepwise-style regression is cherry-picking the variables that happen to work well in the one given sample and then act like you knew from the beginning that those were the right variables instead of accounting for the uncertainty in the stepwise selection process.

Therefore, I disagree with your reference that this is the statistically sound way to go.

However…

…I understand the appeal. Look at the issues. Performance metrics are biased high. Confidence intervals are falsely narrow (so p-values are falsely low). If you can pull a fast one on the reviewers and use this approach to make your results look better than they are, that makes it easier to publish your work and make your case for tenure/promotion/bonus/etc. This is basically the same as how it is easier to look at the smart kid’s exam answers than to learn the material by studying, though.

With the caveat that I only know about the source material from what is described in the question, this sounds like cheating, and I would discourage this practice.

(Stepwise-style regression can be useful if some care is taken to account for the uncertainty in the variable selection. However, this approach does not seem to do so.)

Like I said, I don't think it is the same thing as stepwise regression. We're not talking about throwing 20 variables at the outcome and seeing what sticks. Singer and Willett certainly do not advocate that. They advocate a careful iterative approach. From what I understand their book is one of the pre-eminent texts for performing longitudinal data analysis (https://stats.oarc.ucla.edu/other/examples/alda/) — llewmills, Mar 25 '24 at 06:03
Your description of their idea is exactly a stepwise-style approach that distorts all inferences downstream of the feature-selection steps, perhaps only mildly but perhaps severely. How do you see their approach as accounting for the variable selection? @llewmills — Dave, Mar 27 '24 at 10:32
Theory guides the variable selection, as it always should. This is not feature selection as we understand it now, a fishing expedition. It is a careful measured approach. This discussion has gone into the weeds a bit I may have to reframe the question in a separate post. — llewmills, Mar 28 '24 at 11:20
If this process does not select features, then what does it do? $//$ Retaining the smaller model if the likelihood does not improve enough sure sounds like a way to screen out variables. — Dave, Mar 28 '24 at 13:42
The process tests the effect of predictors. And it differs from what I think of as stepwise selection in that it is not automated. — llewmills, Mar 29 '24 at 00:04
So stick it in a loop with some conditional logic related to the log-likelihood or p-value, and you have an automated process. The important part about what goes wrong with naïve stepwise regression is not that the process it automated but that the inferences downstream of selection steps should account for the fact that there has been a data-driven variable selection that might not always select the same variables, leading to the issues mentioned in my answer. @llewmills — Dave, Mar 29 '24 at 00:13

Why is it preferable to test the effect of a predictor using a likelihood ratio test?

1 Answers1