3

I have created a Poisson regression model with robust error variance (https://academic.oup.com/aje/article/159/7/702/71883) to calculate relative risks.

This is the Poisson regression model:

glm.poisson <- glm(new_anydiagnosis ~ gender + age + socioeco_status+dsdm_category+family_history+mental_before_hiv+relations+sexual_life+stigma_discrimination, 
                family = poisson(link=log), 
                data=finaldata)

To calculate the robust standard errors, I have used the package "sandwich" and "lmtest":


library("sandwich")
library("lmtest")

glm.robust <- coeftest(glm.poisson, vcov = sandwich)

And I get the following estimates:

z test of coefficients:
                      Estimate  Std. Error  z value  Pr(&gt;|z|)    

(Intercept) -1.6661719 0.6595846 -2.5261 0.011534 *
gender2 0.2785553 0.2568445 1.0845 0.278130
age -0.0039597 0.0087759 -0.4512 0.651846
socioeco_status2 -0.2993100 0.2941404 -1.0176 0.308880
socioeco_status3 -0.2157970 0.3404303 -0.6339 0.526149
socioeco_status5 -14.2642777 0.5838599 -24.4310 < 2.2e-16 *** dsdm_category2 0.4010045 0.2631345 1.5240 0.127521
dsdm_category3 0.3511796 0.5260824 0.6675 0.504429
dsdm_category4 0.6580096 0.3078882 2.1372 0.032584 *
family_history2 -0.5735175 0.2567557 -2.2337 0.025502 *
mental_before_hiv2 0.8308926 0.5205853 1.5961 0.110472
mental_before_hiv3 1.4173136 0.5843978 2.4253 0.015298 *
relations2 0.0348879 0.2586927 0.1349 0.892721
relations3 27.5722112 1.6086563 17.1399 < 2.2e-16 *** sexual_life2 0.5890264 0.2753578 2.1391 0.032425 *
sexual_life3 -13.7536832 0.7330256 -18.7629 < 2.2e-16 *** stigma_discrimination2 0.7728856 0.2483147 3.1125 0.001855 ** stigma_discrimination3 -13.9276696 1.0337112 -13.4735 < 2.2e-16 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, now I want to check if multicollinearity between the variables of the model exists. For this, I was considering using the variance inflation factor ("vif").

But when trying this, I do not get estimates of the "vif" for all of the parameters in the model. I only get one estimate instead of estimates of the vif for all variables of the model:

> vif(glm.robust)
  Estimate Std. Error    z value   Pr(>|z|) 
 11.381583   2.220626  10.048533   1.309115 

The problem (I think) is that my glm.robust model is not defined as a model, but regarded as a vector (?), since it is defined based on the "coeftest".

Do you know how I can test for multicollinearity in the Poisson regression model with robust error variance?

I was also considering finding the "vif" for the parameters of the original Poisson model (glm.poisson), but as this model does not use the robust error variance, I am not sure if it would be fine to do.

I am not at all experienced in using Poisson regression, so maybe I am missing something.

Thanks for your help!

1 Answers1

1

The most useful test for multicollinearity in a generalized linear model, particularly with multi-level categorical predictors like yours, is based on the variance-covariance matrix of the coefficient estimates. If the vif() function you invoked is from the R car package, then you are on the right track.

As I understand the coeftest() function, however, it does not return the robust variance-covariance matrix. It generates and uses that matrix for significance testing of the individual coefficients, but then discards it.

With the car vif() function, you need to pass to it an object that has the variance-covariance matrix available via vcov(object). This might already be implemented somewhere, but you could generate the robust matrix directly from the sandwich package, define a new class for that matrix, and write a vcov() function for that class that just returns the matrix. Then submit the newly-classed matrix to vif(). Or you could adapt the code of vif() as shown in the page linked above to work directly on the matrix.

All that said, it's not clear what you gain by evaluating multicollinearity at this stage of the analysis. All (non-perfect) multicollinearity does is to increase the variance of the individual coefficient estimates of correlated predictors. If the model is working well, do you really care about that variance inflation? How would you change the direction of your study based on what you found?

EdM
  • 92,183
  • 10
  • 92
  • 267