When working with survey data that uses probability weights, it is important to be mindful of the fact that traditional likelihood-based tests and fit statistics may not be valid. This is because probability weights can introduce a level of complexity into the data that can affect the assumptions underlying these methods.
There are alternative methods that can be used to compare models in the context of weighted MLE:
Design-based methods: These methods take into account the survey design and account for the complex sampling structure. For example, you can use the Rao-Scott chi-squared test, which is a modified version of the chi-squared test that is appropriate for survey data.
Bootstrap methods: Bootstrap methods are a family of non-parametric resampling techniques that can be used to make inferences about a population from a sample. You can use bootstrapping to estimate standard errors, confidence intervals, and p-values for model parameters and test statistics.
Weighted least squares: One of the alternative is to use weighted least squares instead of Maximum likelihood. It's often used when data has heteroskedasticity or when data is correlated.
Comparison of predicted probabilities: Another alternative is to compare the predicted probabilities from different models to evaluate their performance. A common method is to calculate the average absolute difference in predicted probabilities, which is a measure of the average absolute difference between the predicted probabilities of the two models.
Information criterion: There are alternative information criterion such as “Weighted Akaike information criterion (AIC)” or “Weighted Bayesian information criterion (BIC)” that take into account the complex sampling structure of survey data by incorporating the weights into the formula.
It's important to note that no single method will be the best choice for all cases, and the choice of method will depend on the specifics of your data and research question.