0

I'm running a multinomial logistic regression using the multinom package in R and I'd like to find which predictors I should keep. With this being said, I have a few questions about the difference between a Wald Z and a Likelihood Ratio test (lrtest).

I'm aware that the LR test will estimate 2 models and compare the fit of one with the fit of another by removing predictor variables (which would make the model fit less well aka lower LL). In R, I'm doing this:

library(lmtest)

indels = c("C.T","A.G","G.A","G.C","T.C","C.A","G.T","A.C","C.G","A.del","TAT.del","TCTGGTTTT.del","TACATG.del","GATTTC.del")

my_list = list()

my_list <- lapply(indels, function(i) lrtest(multinom_model_completo, i)) names(my_list) = paste0("lrtest_results_", indels) my_list

I'm essentially saving the values for each of my predictors (such as C.T, A.G, etc) from the lrtest result and the output is something like this (removing the C.T variable, for example):

my_list$lrtest_results_C.T

Likelihood ratio test

Model 1: clade ~ C.T + A.G + G.A + G.C + T.C + C.A + G.T + A.T + T.A + T.G + A.C + C.G + A.del + TAT.del + TCTGGTTTT.del + TACATG.del + AGTTCA.del + GATTTC.del Model 2: clade ~ A.G + G.A + G.C + T.C + C.A + G.T + A.T + T.A + T.G + A.C + C.G + A.del + TAT.del + TCTGGTTTT.del + TACATG.del + AGTTCA.del + GATTTC.del #Df LogLik Df Chisq Pr(>Chisq)
1 114 -341.93
2 108 -349.62 -6 15.387 0.01745 *


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The p-value obtained is essentially saying that the less restrictive model (the complete model) has a better "fit" than the model that got the predictor "C.T" removed, is that correct? Or in this case since the p-value < 0.05 we reject H0 which means that the predictor "C.T" has an effect on our response variable?

Besides this, what would be the difference between this LR test and the wald Z?

I was calculating the p-values like this:

z = summary(multinom_model_completo)$coefficients/summary(multinom_model_completo)$standard.errors
p = round((1 - pnorm(abs(z), 0, 1)) * 2,digits=5)
p

This got me this result (not all predictors are present in this output):

                (Intercept)     C.T     A.G     G.A     G.C     T.C     C.A     G.T     A.T     T.A     T.G     A.C     C.G   A.del
20A                 0.00002 0.00000 0.00000 0.00086 0.24904 0.34031 0.81911 0.00000 0.46962 0.07225 0.97985 0.00000 0.02468 0.22201
20B                 0.00000 0.00455 0.00000 0.00000 0.00000 0.00084 0.04848 0.00000 0.89798 0.28322 0.71973 0.00000 0.03414 0.00915
20E (EU1)           0.00000 0.00000 0.03671 0.80517 0.00001 0.25737 0.31973 0.00000 0.24380 0.72968 0.03140 0.00000 0.00006 0.27005
20I (Alpha, V1)     0.00130 0.23054 0.00015 0.08567 0.00001 0.37208 0.00002 0.09600 0.61350 0.07001 0.88457 0.03258 0.36950 0.79801
20J (Gamma, V3)     0.01766 0.45696 0.12776 0.00500 0.04066 0.27809 0.19422 0.28117 0.16725 0.40563 0.51084 0.00000 0.13344 0.08740
21J (Delta)         0.00061 0.62014 0.00000 0.06122 0.05912 0.18687 0.00326 0.70180 0.79703 0.00234 0.02446 0.00008 0.56678 0.11339

...

How can we interpret the p-values for both of these tests and are they essentially "the same"? The idea I have is that the Wald Z will only require the estimative of one model and it will test our H0 for our parameters (coefficients) against a specific value (such as 0) while the LR test will compare to see if the difference of the LL of the 2 models being compared is statistically significant or not which in turn would also help us understand if the variable being "removed" is helpful to our model or not. Which of these tests would be better to check whose predictors I should keep?

I've checked this SO: Likelihood Ratio vs Wald test , but I'm still finding it hard to grasp what both methods test what exactly

Thank you in advance! Any help is very welcome

  • For testing logistic regressions, LR is preferred to Wald. However, selecting variables this way is problematic. – Dave Jan 05 '22 at 20:28

0 Answers0