I am working with economic data and trying to create a linear regression model for forecasting purposes. The dependent variable data is in terms of percentage change and I've differenced the independent variables to achieve stationarity. The model takes the below form:
$y_t= β_{0}+ β_{1} x_{1,t}+β_{2} x_{2,t}+ ϵ$
The problem is the model has some autocorrelation (Durbin Watson statistic is around 0.9). I am trying to find relatively easy ways to account for the autocorrelation while keeping the simplicity that comes with linear regression. A few things to note:
- I tested the variable significance using HAC standard errors so I have reason to believe the variable significance is not spurious
- I know a lagged dependent variable can be added to account for autocorrelation, but I am trying to avoid that due to all the problems that come with using lagged dependent variables
With that being said, based on some research I've done, it seems like using ARIMA errors could be a good solution. However, I'm struggling to interpret the results. My original model has the below lm() output:
Call:
lm(formula = vars$dependent ~ vars$independent1 + vars$independent2)
Residuals:
Min 1Q Median 3Q Max
-0.030379 -0.005512 0.000417 0.009543 0.032225
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.005916 0.001396 4.236 5.62e-05 ***
vars$independent1 -0.036315 0.003638 -9.981 4.42e-16 ***
vars$independent2 -0.019642 0.003578 -5.489 3.93e-07 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01259 on 87 degrees of freedom
Multiple R-squared: 0.7158, Adjusted R-squared: 0.7093
F-statistic: 109.6 on 2 and 87 DF, p-value: < 2.2e-16
Now the idea is I want to keep this model, but account for the autocorrelation by modeling the residuals using ARIMA. To do this, I did:
residualmodel <- arima(vars$dependent, xreg = vars[, c("independent1", "independent2")], order = c(1, 0, 0))
Which had the output:
Call:
arima(x = vars$dependent, order = c(1, 0, 0), xreg = vars[, c("independent1", "independent2")])
Coefficients:
ar1 intercept independent1 independent2
0.7796 0.0018 -0.0236 -0.0047
s.e. 0.0867 0.0047 0.0054 0.0051
sigma^2 estimated as 9.612e-05: log likelihood = 288.07, aic = -566.15
However, I'm confused about how I can interpret this. Does this mean that the model is:
$y_t= 0.0018 + (-0.0236) x_{1,t}+(-0.0047) x_{2,t}+ ϵ$
where:
$ϵ = 0.77ϵ_{t-1} + μ$
where μ is now a white noise error term.
Is this the correct interpretation? If so, how can I determine the strength of the model using typical metrics like r-square? It seems like in my attempt to correct the autocorrelation I've lost the benefits that come with simple linear regression.
Also, if this isn't a good way to tackle autocorrelation, any advice is appreciated here. It seems difficult to find a good "go-to" means of correcting for this issue.
0.7796rounds to0.78, not0.77. – Richard Hardy Feb 29 '24 at 16:04