2

I am working with economic data and trying to create a linear regression model for forecasting purposes. The dependent variable data is in terms of percentage change and I've differenced the independent variables to achieve stationarity. The model takes the below form:

$y_t= β_{0}+ β_{1} x_{1,t}+β_{2} x_{2,t}+ ϵ$

The problem is the model has some autocorrelation (Durbin Watson statistic is around 0.9). I am trying to find relatively easy ways to account for the autocorrelation while keeping the simplicity that comes with linear regression. A few things to note:

  1. I tested the variable significance using HAC standard errors so I have reason to believe the variable significance is not spurious
  2. I know a lagged dependent variable can be added to account for autocorrelation, but I am trying to avoid that due to all the problems that come with using lagged dependent variables

With that being said, based on some research I've done, it seems like using ARIMA errors could be a good solution. However, I'm struggling to interpret the results. My original model has the below lm() output:

Call:
lm(formula = vars$dependent ~ vars$independent1 + vars$independent2)

Residuals: Min 1Q Median 3Q Max -0.030379 -0.005512 0.000417 0.009543 0.032225

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.005916 0.001396 4.236 5.62e-05 *** vars$independent1 -0.036315 0.003638 -9.981 4.42e-16 *** vars$independent2 -0.019642 0.003578 -5.489 3.93e-07 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01259 on 87 degrees of freedom Multiple R-squared: 0.7158, Adjusted R-squared: 0.7093 F-statistic: 109.6 on 2 and 87 DF, p-value: < 2.2e-16

Now the idea is I want to keep this model, but account for the autocorrelation by modeling the residuals using ARIMA. To do this, I did:

residualmodel <- arima(vars$dependent, xreg = vars[, c("independent1", "independent2")], order = c(1, 0, 0))

Which had the output:

Call:
arima(x = vars$dependent, order = c(1, 0, 0), xreg = vars[, c("independent1", "independent2")])

Coefficients: ar1 intercept independent1 independent2 0.7796 0.0018 -0.0236 -0.0047 s.e. 0.0867 0.0047 0.0054 0.0051

sigma^2 estimated as 9.612e-05: log likelihood = 288.07, aic = -566.15

However, I'm confused about how I can interpret this. Does this mean that the model is:

$y_t= 0.0018 + (-0.0236) x_{1,t}+(-0.0047) x_{2,t}+ ϵ$

where:

$ϵ = 0.77ϵ_{t-1} + μ$

where μ is now a white noise error term.

Is this the correct interpretation? If so, how can I determine the strength of the model using typical metrics like r-square? It seems like in my attempt to correct the autocorrelation I've lost the benefits that come with simple linear regression.

Also, if this isn't a good way to tackle autocorrelation, any advice is appreciated here. It seems difficult to find a good "go-to" means of correcting for this issue.

Amy K
  • 151
  • 1
    0.7796 rounds to 0.78, not 0.77. – Richard Hardy Feb 29 '24 at 16:04
  • Regarding your "I am trying to avoid that due to all the problems that come with using lagged dependent variables," may I ask which problems you're concerned about? – Durden Feb 29 '24 at 16:38
  • I meant the problems that come from adding lagged dependent variables to standard linear regression models (not ARIMA models). Lagged DVs in linear regression tend to be problematic because they diminish the effect of the independent variables which are more important (at least in economics) – Amy K Feb 29 '24 at 19:51

2 Answers2

2

Yes I think you are right, the errors have the AR(1) shape

https://otexts.com/fpp2/regarima.html

It might be better to compare the in sample or out of sample error metrics

Dirk N
  • 283
2

Yes, your interpretation is completely correct.

This is absolutely a valid way to go about this. Alternatively, you can throw everything at once into forecast::auto.arima(), by feeding your regressors into the xreg parameter. This fits a regression with ARIMA errors, and as a bonus automatically determines the errors' ARIMA model form.

$R^2$, as a measure of in-sample fit, is not used often by forecasters. Assessing forecast accuracy on holdout data is more frequently used and, most forecasters would argue, more meaningful. Take a look at the forecasting literature given at Resources/books for project on forecasting models.

Stephan Kolassa
  • 123,354