How can I measure model performance with weighted logistic regression?

Question

I am working with some survey data that uses probability weights. A number of sources explain that likelihood-based tests and fit statistics like likelihood-ratio, AIC, and BIC are not valid in the context of the weighted MLE.

Are there other tests, statistics, or graphics that one can use in this context to get an idea of how one model performs relative to another model?

Be careful, note that survey weights may often be doing things other than just dealing with a complex survey design. There may also be non-response and post-stratification adjustments in the weights. Remember, the weights are for estimation of population totals, not just the inverted sampling probabilities. Also, did you read the paper in the link? — probabilityislogic, Sep 28 '13 at 21:35
What kind of sampling scheme do you have? I am familiar with methods for Case-cohort and Nested case control sampling. I believe any kind of independent (with replacement) sampling can also be easily dealt with, because the weights are conditionally independent given the data, hence standard empirical likelihood theory can be used to justify e.g. score tests based on the weighted likelihood ratio. — guest47, Apr 01 '13 at 04:25

score -2 · Answer 1 · answered Jan 10 '23 at 12:03

When working with survey data that uses probability weights, it is important to be mindful of the fact that traditional likelihood-based tests and fit statistics may not be valid. This is because probability weights can introduce a level of complexity into the data that can affect the assumptions underlying these methods.

There are alternative methods that can be used to compare models in the context of weighted MLE:

Design-based methods: These methods take into account the survey design and account for the complex sampling structure. For example, you can use the Rao-Scott chi-squared test, which is a modified version of the chi-squared test that is appropriate for survey data.
Bootstrap methods: Bootstrap methods are a family of non-parametric resampling techniques that can be used to make inferences about a population from a sample. You can use bootstrapping to estimate standard errors, confidence intervals, and p-values for model parameters and test statistics.
Weighted least squares: One of the alternative is to use weighted least squares instead of Maximum likelihood. It's often used when data has heteroskedasticity or when data is correlated.
Comparison of predicted probabilities: Another alternative is to compare the predicted probabilities from different models to evaluate their performance. A common method is to calculate the average absolute difference in predicted probabilities, which is a measure of the average absolute difference between the predicted probabilities of the two models.
Information criterion: There are alternative information criterion such as “Weighted Akaike information criterion (AIC)” or “Weighted Bayesian information criterion (BIC)” that take into account the complex sampling structure of survey data by incorporating the weights into the formula.

It's important to note that no single method will be the best choice for all cases, and the choice of method will depend on the specifics of your data and research question.

I have never heard about the "Weighted Akaike information criterion" (AIC) could you give a reference to it? — utobi, Jan 10 '23 at 12:20
@utobi https://link.springer.com/content/pdf/10.3758/BF03206482.pdf — chatGPT, Jan 10 '23 at 18:25

How can I measure model performance with weighted logistic regression?

1 Answers1

Linked