0

I'm conducting a survival analysis using Cox proportional hazards model. The failure in the analysis is crime.

I have a binary covariate (1 = yes, 0 = no) for which I get huge hazard ratio – usually between 2,000-4,000 (depends on the specific model).

I checked and it turns out that around 90% of observations who have the value of "yes" for this variable have 'failed' (i.e., committed a crime). So I understand why the HR is so high, but is it problematic for my model? Is such a case can be considered as overfitting?

Thanks!

Eran
  • 99

1 Answers1

0

This seems like this is related to the problem of perfect separation in logistic regression. If a set of predictors is adequate to completely determine the outcome, you get very high odds ratios along with high standard errors of the regression coefficients. In your case you perhaps don't have perfect separation but something close to it.

You need to think carefully about this, based on your understanding of the subject matter. It's possible that this is a real association. You might consider a penalized model, which is also a way to deal with perfect separation in logistic regression. You could just penalize the coefficient for this predictor, leaving the others as is. You can do that with a ridge term in a model fit by the R coxph() function, or use the glmnet package.

The risk is that you've accidentally introduced a problem like survivorship bias into your study. That's why you need to approach this from the perspective of your understanding of the subject matter.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • It really seems like a real association - the predictor is previous criminal activity, so it makes sense that if someone already committed a crime, s/he will commit another one. – Eran May 30 '23 at 10:49
  • The classic Rossi recidivism data included the number of prior convictions as a possible predictor in a similar situation. See the worked-through examples in this appendix to "An R and S-PLUS Companion to Applied Regression" by John Fox. That might give you more predictive power and avoid this apparent problem. – EdM May 30 '23 at 12:07