I am running an OLS on which the dependent variable (Y) and the independent variables (X1, X2, X3, ...) are non-stationary. But the residuals are found to be stationary. Does this mean my regression is OK? Or is it spurious? I tried searching online but I got a bit confused.
-
1Do all of your variables have unit roots? Or what kind of nonstationarity is it? – Richard Hardy Jan 08 '20 at 10:41
-
The variables can be non-stationary provided their relation doesn't change over time (if it does and you don't model it, you will see this in the residuals, they will not be stationary). A simple example of your case where OLS is fine would be $Y_t = \beta t + u_t$ with $u_t$ stationary. If you look into "co-integration" you can find the relevant time series theory. – CloseToC Jan 08 '20 at 12:18
-
@RichardHardy yes they have unit root but the residuals are stationary. Is that ok or problematic? – adrCoder Jan 08 '20 at 13:33
-
1Your point estimates are kind of OK, but probably not the standard errors, confidence intervals, p-values, etc. You would be safer to use an error correction model (ECM; see [tag:ecm]) $$\Delta y_t=\gamma_0 + \alpha \text{ect}{t-1}+\gamma_1 \Delta x{1,t}+\dots+\gamma_p \Delta x_{p,t}$$ where $\text{ect}$ is the error-correction term (a stationary linear combination of $y$ and $x$s). – Richard Hardy Jan 08 '20 at 14:14
-
@RichardHardy do you have some reference (book, paper, online article) that states and justifies why the point estimates are ok in case the variables are non-stationary but the residuals are stationary? – adrCoder Jan 23 '20 at 07:38
-
Most time series textbooks include a treatment of regression of cointegrated variables. Superconsistency is one keyword to look for (characterizing the point estimator in such a regression). – Richard Hardy Jan 23 '20 at 07:41
-
@RichardHardy thank you. Can you please recommend some good books so that I order them and read them? – adrCoder Jan 23 '20 at 12:07
-
1See these references, also Lütkepohl "New Introduction to Multiple Time Series Analysis" (2005). – Richard Hardy Jan 23 '20 at 13:27
-
Many thanks Richard! – adrCoder Jan 23 '20 at 14:24
1 Answers
...the dependent variable (Y) and the independent variables (X1, X2, X3, ...) are non-stationary. But the residuals are found to be stationary. Does this mean my regression is OK?...
This suggests there is a cointegrating relationship among the dependent and independent variable. In this case, the OLS estimates are (super)-consistent.
For example, consider the simplest cointegration model: $$ y_t = \beta x_t + \epsilon_t $$ where $\epsilon_t \stackrel{i.i.d.}{\sim} (0,1)$ and $u_t \stackrel{i.i.d.}{\sim} (0,1)$ are two independent white noise and $$ x_t = \sum_{s = 0}^t u_s $$ is a random walk with $\{ u_t \}$ as innovations.
Both $x_t$ and $y_t$ are not non-stationary but the population error term $y_t - \beta x_t$ is stationary. There is a cointegrating relationship with cointegrating vector $(1, -\beta)$.
The OLS estimate $\hat{\beta}$ satisfies $$ T ( \hat{\beta} - \beta ) = \frac{\sum_1^T x_t \epsilon_t}{\sum_1^T x_t^2} \stackrel{d}{\rightarrow} \frac{\int_0^1 W^{(1)}_t dW^{(2)}_t}{\int_0^1 \left( W^{(1)}_t \right)^2 dt}, $$ where $W^{(1)}$ and $W^{(2)}$ are independent standard Brownian motions. Therefore $\hat{\beta} - \beta = O_p(\frac{1}{T})$, i.e. $\hat{\beta}$ is T-consistent. In contrast, in the stationary case, $\hat{\beta}$ is $\sqrt{T}$-consistent.
Here (T-)consistency holds even when $\epsilon_t$ and $u_t$ are correlated. This is in contrast to the stationary case where consistency requires $E[\epsilon_t x_t] = 0$.
Unlike the stationary case, where inference is obtained via asymptotic normal distributions, here the asymptotic distribution of $T ( \hat{\beta} - \beta )$ is non-normal. Same goes for the asymptotic distribution of the usual Wald/F-statistics (critical values can be obtained by simulation, if needed).
Notice that sum of squared residuals of this cointegration regression is $$ \sum_1^T \epsilon_t ^2 - \frac{( \sum_1^T x_t \epsilon_t)^2}{\sum_1^T x_t^2} = O_p(T) - O_p(1) = O_p(T), $$ which implies the residuals are stationary. On the other hand, if the regression is spurious, the sum of squared residuals is $$ \sum_1^T y_t ^2 - \frac{( \sum_1^T x_t y_t)^2}{\sum_1^T x_t^2} = O_p(T^2) - O_p(T^2) = O_p(T^2) $$ which implies the residuals are non-stationary. The fact that you found residuals to be stationary suggests your regression is cointegrated, rather than spurious.
Further Comments
In applying unit root tests to residuals to check for non-stationarity, standard critical values cannot be used. For example, for the Augmented Dickey-Fuller test-statistic computed using residuals, the Engel-Granger critical values should be used.
As already pointed out in comments, for a cointegration model there exists a corresponding error correction model (ECM). For the above simple example, the corresponding ECM is $$ \Delta y_t = \Delta x_t + \underbrace{ (y_{t-1} - \beta x_{t-1})}_\text{$\epsilon_t$}. $$ While the cointegration model describe the long-run equilibrium relationship between $x_t$ and $y_t$, the corresponding ECM describes the short-run relationship between $\Delta y_t$, $\Delta x_t$, and the deviation $\epsilon_t$ from long-run level. Notice that all variables in the ECM are stationary. Which model you estimate depends on the question of interest. To estimate the ECM, regress $\Delta y_t$ on $\Delta x_t$ and $e_t$ where $e_t$ is the residual from the cointegration regression.
- 3,328