In predictive modelling, it is useful to perform resampling methods to assess performance e.g. validate() and calibrate() in the rms package in R. Is it useful / advisable to perform such methods when the primary purpose is estimating effects e.g. hazard ratios from a Cox model, odds ratios from logistic regression etc.?
1 Answers
If you sole goal is effect estimation, then Frank Harrell says in Section 4.12.2 of Regression Modeling Strategies, 2nd edition:
Model validation is not necessary unless the analyst wishes to use it to quantify the degree of overfitting.
That said, the quality of your effect estimation depends directly on the quality of your model. For your own peace of mind and that of your audience it can be helpful to quantify the degree of overfitting by model validation. You don't want to make an effect estimate based on a severely overfit model.
Evaluating model calibration might help identify problems with model specification (e.g., missing predictors or interaction terms associated with outcome); the effect estimated from a mis-specified model might not be correct. In Cox or logistic regression models, omitting any outcome-associated predictor from a model can bias the coefficient estimates of included predictors toward 0.
There's little cost to doing that validation and calibration and potentially much to gain.
- 92,183
- 10
- 92
- 267
-
Thanks EdM. I agree, which is why I found it strange that @Frank Harrell stated, "Model validation is not necessary unless the analyst wishes to use it to quantify the degree of overfitting" since in the introduction of his great book he says "Prediction could be considered a superset of hypothesis testing and estimation". So surely it follows that it is a worthwhile exercise to validate and calibrate the model using resampling even if the goal is effect estimation. – user167591 Nov 19 '22 at 13:14