2

I have developed a scoring system using logistic regression. The score ranges between 0 and 6 (using integers) and predicts death. It does not use a conventional regression formula and thus I am not able to calculate a precise value of the predicted risk of dying. An example of the score could look like this:

Score      Dead    Alive
0            0      101
1            1      911
2            3      672
3            2      291
4            8      78
5            10     60
6            5      4

I know that I have to use Pearson's goodness of fit to test the goodness-of-fit and have three cohorts, a development cohort and two independent validation cohorts.

My question is: How do I calculate the Pearson Goodness-of-fit test in each cohort? In the development cohort, what would be my expected mortality? In the validation cohorts, I guess that I could use the observed mortality in the development cohort as expected mortality.

  • 2
    You seem to have a great predictor. When plotted, there is a strong curvilinear relationship between percent dead and your predictor score. Perhaps the Pearson Goodness-of-fit test is not the best way to index this fit. – Joel W. Aug 31 '12 at 13:13
  • The chi square test is not the only option for test goodness of fit. – Michael R. Chernick Aug 31 '12 at 13:40
  • Often the Hosmer-Lemeshow test is used for testing goodness-of-fit for logistic regression; see link. – RioRaider Aug 31 '12 at 13:46
  • 1
    That test has been shown to be arbitrary and not have competitive power in some situations. I prefer directed tests (e.g., nonlinearity) but if you want an omnibus test see the following, as implemented in the R rms package's residuals.lrm function: Hosmer, D. W.; Hosmer, T.; le Cessie, S. & Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model Statistics in Medicine, 1997, 16, 965-980 – Frank Harrell Aug 31 '12 at 16:43

1 Answers1

3

I don't see where this is a goodness of fit problem. It appears to be more of a risk estimation problem, for which you might consider fitting a binary logistic model that is quadratic in the score. @RioRaider - note that the older Hosmer-Lemeshow test is now obsolete.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397