Backward selection (with fastbw) in penalized logistic regression

Question

I have a dataset with more than 20 predictors and a single binary response variable. With only $n=181$ observations (64 deaths, 117 survivors), I decided to apply penalized logistic regression to modeling, with all predictors involved (so that I avoid problems associated with model selection). Nevertheless, I have to produce a ''simpler'' model too (i.e. one that is simple enough to be suitable for a nomogram-style hand calculation in clinical setting). For that end, I intend to use rms's fastbw.

To exemplify my questions, I'll use the support dataset from Hmisc:

library( rms )
getHdata( support )
fit <- lrm( hospdead ~ rcs( age ) + sex + rcs( meanbp ) + rcs( crea ) + rcs( ph ) + rcs( sod ), data = support, x = TRUE, y = TRUE )
fit

First, I apply penalization:

p <- pentrace( fit, seq( 0, 10, by = 0.01 ) )
plot( p )
fitPen <- update( fit, penalty = p$penalty )
fitPen

I hope I'm correct up to this point.

Next, I validate the model and calculate its calibration curve. If I understand it correctly, I shouldn't validate/calibrate the simpler model, rather, I have to run the necessary functions on the original model, but with bw=T. That is:

validate( fitPen, B = 1000, bw = TRUE )
plot( calibrate( fitPen, B = 1000, bw = TRUE ) )

Question #1: Am I correct in this? I.e. is it true that to get the simpler model's validation/calibration I have to run these not on the simpler model, but on the original one (with bw=T)? And the results will be those pertaining to the simpler model, despite the fact that I haven't run validation/calibration on the simpler model itself?

Next, I try to come up with the simpler model explicitly. Interestingly, (Harrell, 1998) uses a method which is based on calculating the logits for the observations, then modeling them with OLS, then narrowing this model with fastbw. Although it is surely my statistical shortcoming, I simply can't understand why this is necessary.

Question #2: Why can't we directly use fastbw on the logistic regression model? Such as:

 fastbw( fitPen )
 fitApprox <- lrm( as.formula( paste( "hospdead ~", paste( fastbw( fitPen )$names.kept, collapse = "+" ) ) ), data = support, x = TRUE, y = TRUE )

And finally, I am not completely sure on where should I apply penalizing in the whole process.

Question #3: Should I penalize the original model, then run fastbw (see above), and then re-penalize the obtained model? I.e.

p <- pentrace( fitApprox, seq( 0, 10, by = 0.01 ) )
plot( p )
fitApproxPen <- update( fitApprox, penalty = p$penalty )
fitApproxPen

Or I don't have to re-penalize the narrowed model? Or I don't have to penalize the original model and it is sufficient to penalize the simpler one? (I suspect that the very first option is the correct, but I'm not entirely sure.)

score 4 · Accepted Answer · answered May 31 '15 at 22:45

To put the problem in perspective, suppose that there were no predictors at all so that you only needed to estimate the intercept. This is equivalent to estimating the overall probability that $Y=1$. With 64 events of 181 subjects, the Wilson 0.95 confidence interval is [0.29, 0.43]. This precision would not be acceptable for many purposes. Adding covariates makes the confidence intervals significantly wider. WIth 64 events you could model a maximum of about 4 covariates using the 15:1 rule. Do the data have enough information to model zero covariates? What you are trying to do is very difficult and although shrinkage is better than no shrinkage, nothing can truly rescue the situation.

To answer your first question, yes you have to tell the resampling procedure to start from zero in re-estimating the model on each resample. So it does have to repeat the stepwise variable select anew each time.

Question 2: If you were not needing to penalize the model, you would have the option of running fastbw on the original dataset and having the bootstrap penalize for having done stepwise selection as discussed above. Since you need to penalize, we don't know how to properly do backwards step-down. Contrast this with lasso and elastic net where variable selection and penalization are properly combined.

Questions 3: This is related to Q2; fastbw doesn't understand penalization. The only general solution is model approximation (called pre-conditioning in some quarters) whereby the full penalized model is the gold standard, and you do backwards stepdown (which starts with an $R^2$ of 1.0) using ols and try to approximate the gold standard penalized model using a subset of the variables, getting an approximation accuracy of at least, say $R^{2} = 0.95$. Note that model approximate has the nice feature that the earlier shrinkage is inherited by the sub-models, so shrinkage is fully accounted for. We do need more simulation studies of the performance of model approximation.

Thank you very much (and sorry for the typo!). I'll definitely think over if we can come up with something for the events per variable issue. (I'm thinking of variable selection blinded to the outcome, i.e. clustering, or incorporating external information, such as correct sign...) Just one follow-up question to your answers: Am I right to presume that then validate and calibrate (with bw=T) also don't take penalization into account? (If fastbw doesn't.) If so, does that mean the the results obtained with validate/calibrate will be misleading in this sense? — Tamas Ferenci, Jun 01 '15 at 08:34
validate and calibrate do take penalization into account, by making the assumption that the optimum penalty is a constant -- the penalty found from running pentrace on the original sample. Besides considering data reduction (masked to $Y$), consider whether the whole exercise is going to yield estimates that have sufficient precision in light of my initial comments. — Frank Harrell, Jun 01 '15 at 12:25
Ah, so that validate/calibrate extract the value of the penalty from the model that was passed to them? I didn't realize that (although it'd have be easy by checking the source code, I'm sorry). Thank you, everything is clear now! — Tamas Ferenci, Jun 02 '15 at 16:53
Yes I think they do, which assumes there is no error in estimating the penalty (an unlikely assumption but not wildly wrong). — Frank Harrell, Jun 02 '15 at 18:15

Backward selection (with fastbw) in penalized logistic regression

1 Answers1

Linked