1

For inverse probability weighting (IPW) in R the use of survey::svyglm is well established. I want to compare the results of 10K+ (!) models with and without the use of IPW-weights (i.e. sampling weights aka 'pweights' in Stata).

From Lumley (2010) we learn, that the HC0 standard errors "are almost identical to the design-based standard errors of [survey::svyglm]; the main difference being in the handling of stratification".

However, when comparing two models without weights and without strata, the results are still only "almost identical".

fo <- Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species

dsg <- survey::svydesign(ids=~1, weights=~1, data=iris) summary(survey::svyglm(fo, design=dsg))$coe

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.1712663 0.24279816 8.942680 1.692962e-15

Sepal.Width 0.4958889 0.07861740 6.307623 3.278723e-09

Petal.Length 0.8292439 0.07081753 11.709585 1.141180e-22

Petal.Width -0.3151552 0.16137075 -1.952988 5.276000e-02

Speciesversicolor -0.7235620 0.23345593 -3.099351 2.333076e-03

Speciesvirginica -1.0234978 0.34165977 -2.995664 3.225249e-03

fit <- lm(fo, iris) cbind(lmtest::coeftest(fit, vcov.=sandwich::vcovHC(fit, type='HC0')))

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.1712663 0.24198747 8.972639 1.422017e-15

Sepal.Width 0.4958889 0.07835490 6.328754 2.946009e-09

Petal.Length 0.8292439 0.07058108 11.748813 9.004061e-23

Petal.Width -0.3151552 0.16083194 -1.959531 5.198181e-02

Speciesversicolor -0.7235620 0.23267644 -3.109735 2.257633e-03

Speciesvirginica -1.0234978 0.34051900 -3.005700 3.126834e-03

In my project, the survey::svyglm approach without IPW-weights now yields 7.6% significant BH-adjusted p-values of the treatment variable, whereas the lm/"HC0" approach yields 6.2%.

Consequently the results differ considerably depending on the approach.

I consider to report the survey::svyglm values, since the only difference between IPW and naïve model is the use of IPW-weights.

However, if I had not intended to compare IPW with naive models, I would not have used survey::svyglm and thus would have arrived at different results.

So my question is, how exactly are these "design-based standard errors" calculated when there are neither weights nor stratification in the model? Are they also valid/better without weights and strata? How do they differ from the HC0-standard errors?

jay.sf
  • 725
  • 1
    This blog post gives an explanation of what svyglm() is doing and explains the sandwich estimator it uses for standard errors: https://www.practicalsignificance.com/posts/survey-regression-what-we-estimate/ This paper by Dr. Lumley (the author of the 'survey' package) provides a much more detailed explanation: https://projecteuclid.org/journals/statistical-science/volume-32/issue-2/Fitting-Regression-Models-to-Survey-Data/10.1214/16-STS605.pdf – bschneidr Mar 06 '23 at 21:56

1 Answers1

1

They are scaled by different constants. Multiply the variances from the svyglm object by $149/150$, i.e. (nrow(iris) - 1)/ nrow(iris), to get them to match exactly:

all.equal((nrow(iris) - 1)/ nrow(iris) * vcov(svyglm(fo, design=dsg)),
          vcovHC(fit_lm, type = "HC0"))
# [1] TRUE
psboonstra
  • 2,155