Goodness of fit for Linear Probability Model (LPM)

Question

I'm running a linear probability model (LPM), i.e. my outcome is binary and I have predictors that are categorical and continuous (I'm aware of some of the pros and cons of using LPM for a binary outcome).

Besides checking the robust standard errors, I was wondering what else I should check for this model.

Since R² is not a good measure for a binary outcome, what can test the goodness of fit for LPM? Unfortunately I couldn't find much on this topic.

Thanks!

Why isn't $R^2$ a good measure? Yes, $R^2$ loses its usual interpretation when you go nonlinear, such as a logistic regression, but your model is linear. — Dave, Dec 06 '21 at 13:15
What is a linear probability model? What is the optimality criterion used to fit it? If the LPM is just OLS, i.e., minimizes sum of squared errors, then you don't need a goodness of fit test because you already know it doesn't fit---it yields negative probabilities or probabilities > 1. — Frank Harrell, Dec 06 '21 at 13:23

score 2 · Answer 1 · edited Feb 04 '22 at 20:35

I think the answer to your question is to use the 'percent correctly predicted' measure. Quoting directly from Woolridge's textbook:

"Still, there are ways to use the estimated probabilities (even if some are negative or greater than one) to predict a zero-one outcome. As before, let y^i denote the fitted values—which may not be bounded between zero and one. Define a predicted value as y|i 5 1 if y^i $ .5 and y|i 5 0 if y^i , .5. Now we have a set of predicted values, y|i, i 5 1, . . . , n, that, like the yi, are either zero or one. We can use the data on yI and y|i to obtain the frequencies with which we correctly predict yi 5 1 and yi 5 0, as well as the proportion of overall correct predictions. The latter measure, when turned into a percentage, is a widely used goodness-of-fit measure for binary dependent variables: the percent correctly predicted."

score 1 · Accepted Answer · answered Dec 06 '21 at 13:30

1

A goodness of fit test generally refers to comparing the posed model with an ANOVA-type model through replications in the sampling design. This is also referred to a test for lack of fit. When replications do not exist, pseudo-replicates are obtained by grouping observations that are near. The LPM is an OLS model, hence the normal lack-of-fit test is applicable. See for instance https://en.wikipedia.org/wiki/Lack-of-fit_sum_of_squares

answered Dec 06 '21 at 13:30

user277126

1,350

2

Why would the normal lack of fit test be applicable when LPM gets the variance structure and distribution incorrect? Goodness of fit is better assessed through directed assessments, e.g., nonlinearity and non-additivity. Replication is not required. – Frank Harrell Dec 06 '21 at 13:34

Goodness of fit for Linear Probability Model (LPM)

2 Answers2