Validation of logistic regression - goodness of fit (pearson)

Question

I have developed a scoring system using logistic regression. The score ranges between 0 and 6 (using integers) and predicts death. It does not use a conventional regression formula and thus I am not able to calculate a precise value of the predicted risk of dying. An example of the score could look like this:

Score      Dead    Alive
0            0      101
1            1      911
2            3      672
3            2      291
4            8      78
5            10     60
6            5      4

I know that I have to use Pearson's goodness of fit to test the goodness-of-fit and have three cohorts, a development cohort and two independent validation cohorts.

My question is: How do I calculate the Pearson Goodness-of-fit test in each cohort? In the development cohort, what would be my expected mortality? In the validation cohorts, I guess that I could use the observed mortality in the development cohort as expected mortality.

You seem to have a great predictor. When plotted, there is a strong curvilinear relationship between percent dead and your predictor score. Perhaps the Pearson Goodness-of-fit test is not the best way to index this fit. — Joel W., Aug 31 '12 at 13:13
The chi square test is not the only option for test goodness of fit. — Michael R. Chernick, Aug 31 '12 at 13:40
Often the Hosmer-Lemeshow test is used for testing goodness-of-fit for logistic regression; see link. — RioRaider, Aug 31 '12 at 13:46
That test has been shown to be arbitrary and not have competitive power in some situations. I prefer directed tests (e.g., nonlinearity) but if you want an omnibus test see the following, as implemented in the R rms package's residuals.lrm function: Hosmer, D. W.; Hosmer, T.; le Cessie, S. & Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model Statistics in Medicine, 1997, 16, 965-980 — Frank Harrell, Aug 31 '12 at 16:43

score 3 · Answer 1 · answered Aug 31 '12 at 13:51

I don't see where this is a goodness of fit problem. It appears to be more of a risk estimation problem, for which you might consider fitting a binary logistic model that is quadratic in the score. @RioRaider - note that the older Hosmer-Lemeshow test is now obsolete.

Validation of logistic regression - goodness of fit (pearson)

1 Answers1

Linked