13

I am running a pooled OLS regression using the plm package in R. Though, my question is more about basic statistics, so I try posting it here first ;)

Since my regression results yield heteroskedastic residuals I would like to try using heteroskedasticity robust standard errors. As a result from coeftest(mod, vcov.=vcovHC(mod, type="HC0")) I get a table containing estimates, standard errors, t-values and p-values for each independent variable, which basically are my "robust" regression results.

For discussing the importance of different variables I would like to plot the share of variance explained by each independent variable, so I need the respective sum of squares. However, using function aov(), I don't know how to tell R to use robust standard errors.

Now my question is: How do I get the ANOVA table/sum of squares that refers to robust standard errors? Is it possible to calculate it based on the ANOVA table from regression with normal standard errors?

Edit:

In other words and disregarding my R-issues:

If R$^2$ is not affected by using robust standard errors, will also the respective contributions to explained variance by the different explanatory variables be unchanged?

Edit:

In R, does aov(mod) actually give a correct ANOVA table for a panelmodel (plm)?

Aki
  • 517

2 Answers2

14

The ANOVA in linear regression models is equivalent to the Wald test (and the likelihood ratio test) of the corresponding nested models. So when you want to conduct the corresponding test using heteroskedasticity-consistent (HC) standard errors, this cannot be obtained from a decomposition of the sums of squares but you can carry out the Wald test using a HC covariance estimate. This idea is used in both Anova() and linearHypothesis() from the car package and coeftest() and waldtest() from the lmtest package. The latter three can also be used with plm objects.

A simple (albeit not very interesting/meaningful) example is the following. We use the standard model from the ?plm manual page and want to carry out a Wald test for the significance of both log(pcap) and unemp. We need these packages:

library("plm")
library("sandwich")
library("car")
library("lmtest")

The model (under the alternative) is:

data("Produc", package = "plm")
mod <- plm(log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp,
  data = Produc, index = c("state", "year"))

First, let's look at the marginal Wald tests with HC standard errors for all individual coefficients:

coeftest(mod, vcov = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
log(pc)    0.2920069  0.0617425  4.7294 2.681e-06 ***
log(emp)   0.7681595  0.0816652  9.4062 < 2.2e-16 ***
log(pcap) -0.0261497  0.0603262 -0.4335   0.66480    
unemp     -0.0052977  0.0024958 -2.1226   0.03411 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And then we carry out a Wald test for both log(pcap) and unemp:

linearHypothesis(mod, c("log(pcap)", "unemp"), vcov = vcovHC)

Linear hypothesis test

Hypothesis:
log(pcap) = 0
unemp = 0

Model 1: restricted model
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp

Note: Coefficient covariance matrix supplied.

  Res.Df Df  Chisq Pr(>Chisq)  
1    766                       
2    764  2 7.2934    0.02608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alternatively, we can also fit the model under the null hypothesis (mod0 say) without the two coefficients and then call waldtest():

mod0 <- plm(log(gsp) ~ log(pc) + log(emp),
  data = Produc, index = c("state", "year"))
waldtest(mod0, mod, vcov = vcovHC)

Wald test

Model 1: log(gsp) ~ log(pc) + log(emp)
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp
  Res.Df Df  Chisq Pr(>Chisq)  
1    766                       
2    764  2 7.2934    0.02608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test statistic and p-value computed by linearHypothesis() and waldtest() is exactly the same. Just the interface and output formatting is somewhat different. In some cases one or the other is simpler or more intuitive.

Note: If you supply a covariance matrix estimate (i.e., a matrix like vocvHC(mod)) instead of a covariance matrix estimator (i.e., a function like vocvHC), make sure that you supply the HC covariance matrix estimate of the model under the alternative, i.e., the non-restricted model.

Achim Zeileis
  • 15,515
  • 2
  • 38
  • 62
  • 1
    If I understand it correctly, the Wald-test shows whether including certain variables is significant or not. But is there a measure of how much they actually improve the model, as e.g. sum of squares? – Aki Jan 07 '15 at 14:41
  • How can I implement HAC standard errors? I tried coeftest(mod, vcov = vcovHAC) but got an error message saying "no applicable method for 'estfun' applied to an object of class "c('plm', 'panelmodel')". It seems I need to calculate weights or an estimation function first. What method would you recommend? – Aki Feb 07 '16 at 16:45
  • While the plm package has methods for the vcovHC() generic from the sandwich package, it does not provide methods for vcovHAC(). Instead plm ships with its own dedicated functions for computing covariance matrices in panel models that potentially include serial correlation as well. See vcovNW() or vcovSCC() in package plm. – Achim Zeileis Feb 07 '16 at 23:03
  • Thank you! As far as I understand both functions relate to autocorrelation-robust SE. Is there any function that can be used for both heteroscedasticity- and autocorrelation-robust SE? – Aki Feb 08 '16 at 08:12
  • All three functions (vcovHAC, vcovNW, vcovSCC) are _H_eteroskedasticity and _A_utocorrelation _C_onsistent...that's what HAC stands for. – Achim Zeileis Feb 08 '16 at 11:31
  • OK. In the plm description it sounded like they were only correcting for autocorrelation, since it said nothing about heteroskedasticity. coeftest(mod, vcov = vcovNW) gives the error message "object 'vcovNW' not found", although all packages are loaded. Also the example in the plm-package description gives the same error. What am I doing wrong here? – Aki Feb 08 '16 at 13:39
  • You need to check the corresponding papers which types of correlation and/or heteroskedasticity are exactly accounted for and which are not. There are more possible patterns in panel data than in cross-section data - and the details differ between the different estimators. And vcovNW was added in recent versions of plm (1.5-x). – Achim Zeileis Feb 08 '16 at 15:48
  • Will this approach work with mlm objects from multivariate regressions? – alexpghayes Jun 09 '22 at 19:51
  • Conceptually, the same strategy could be employed: vcovHC() works for mlm objects and the same idea for the Wald test can be employed. However, because in mlm the coef() are stored as a matrix as opposed to a vector with the same naming convention as the vcov(), the waldtest() function does not work out of the box. It might be possible to write a waldtest.mlm() method that uses the same trick as the coeftest.mlm() method (computing the vcov first, replacing the coef matrix with a vector, and then calling the default method) but I would need to have a closer look at this... – Achim Zeileis Jun 10 '22 at 02:55
2

This can be done with the Anova function in the car package. See this excellent answer for more detail and a review of other techniques for dealing with heteroskedasticity in ANOVA.

shadowtalker
  • 12,551
  • Thank you. The problem is, that Anova() does not seem to work with models of type plm (panelmodel). – Aki Jan 06 '15 at 16:48
  • @Aki if I'm not mistaken the pooled OLS is equivalent to plain OLS, based on what it says in the vignette: http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf – shadowtalker Jan 06 '15 at 16:55
  • @Aki however it sounds like you might be interested in a richer ANOVA model. There are some R examples here: http://stats.stackexchange.com/questions/3874/unbalanced-mixed-effect-anova-for-repeated-measures – shadowtalker Jan 06 '15 at 18:47