4

I'm using R to do some time series estimation. I'm trying to rebuild the fitted values from an Arima model by hand to use in an Excel spreadsheet using the estimated coefficients and the input data. I can use the fitted command, but I'm trying to understand more how it works. Ex:

library(MASS)
library(tseries)
library(forecast)

set.seed(1)
N = ts(mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1,0.56,0.56,1), ncol=2), 
       empirical=TRUE), frequency=12)
head(N)

>            [,1]       [,2]
>[1,] -0.05270976  0.7239571
>[2,] -0.67232349 -0.6631604
>[3,] -0.20193415  0.8176053
>[4,] -0.54278281 -2.0458285
>[5,]  1.38279994  0.9405811
>[6,]  1.39979731  2.1717733

# Model: x(t) = a * x(t-1) + e(t)
fit = Arima(N[,1], order=c(1,0,0), include.constant=FALSE)

> fit  
>Series: N[, 1]  
>ARIMA(1,0,0) with zero mean          
>
>Coefficients:  
>         ar1  
>       0.0293
>s.e.   0.1400  
>
>sigma^2 estimated as 0.9791:  log likelihood=-70.42
>AIC=144.84   AICc=145.1   BIC=148.66

# Build the fitted values: x(t)=a * x(t-1) 
pred  = fit$coef[1] * lag(fit$x, -1) 
pred1 = fitted(fit)
head(cbind(pred, pred1))   

>             pred         pred1
>[1,]           NA -2.255567e-05
>[2,] -0.001541849 -1.541849e-03
>[3,] -0.019666597 -1.966660e-02
>[4,] -0.005906915 -5.906915e-03
>[5,] -0.015877313 -1.587731e-02
>[6,]  0.040449232  4.044923e-02 

In this case, pred and pred1 match.

However when I add in an xreg:

# Model: x(t) = a*x(t-1) + b*xreg + e(t)
fit1 = Arima(N[,1], order=c(1,0,0), xreg=N[,2], include.constant=FALSE)

>fit  
>Series: N[, 1]  
>ARIMA(1,0,0) with zero mean         
>
>Coefficients:  
>         ar1  N[, 5]  
>       0.0860  0.5606  
>s.e.   0.1401  0.1155  
>
>sigma^2 estimated as 0.6675:  log likelihood=-60.85
>AIC=127.69   AICc=128.22   BIC=133.4

# Build the fitted values: x(t) = a*x(t-1) + b*xreg 
pred2  = fit1$coef[1]*lag(fit1$x, -1) + fit1$coef[2]*fit1$xreg 
pred21 = fitted(fit1) 
head(cbind(pred2, pred21))

>              pred2     pred21
>[1,]         NA  0.4041670
>[2,]  0.4013329 -0.4112205
>[3,] -0.4296032  0.4325201
>[4,]  0.4410005 -1.2037229
>[5,] -1.1936161  0.5792684
>[6,]  0.6462336  1.2911169

In this case, pred2 and pred21 do not match, and the only thing changed was adding an xreg. The only time I cannot build out the fitted values by hand is when the AR part is included. I was able to do it when only MA parts were included with the xreg. I would really appreciate knowing how Arima treats xreg when generating the fitted values.

ajhubb
  • 41
  • 1
  • 3
  • This sounds like a programming question rather than a statistics question per se. Are you just asking how the function is working? What language are you using, R? – gung - Reinstate Monica Aug 04 '14 at 17:09
  • I'm using R, I will add that to the post. More so I need to know how the function is working so I can rebuild the model by hand rather than using the fitted() command. – ajhubb Aug 04 '14 at 17:10
  • If you just want to see the code for the function, type the function name at the command prompt. I'm still not sure whether this is about understanding how ARIMA works (in which case, it belongs here) or understanding how the code works (in which case, it probably belongs on Stack Overflow). However, it will need a reproducible example; can you add one? – gung - Reinstate Monica Aug 04 '14 at 17:18
  • I want to know why the fitted values I'm building do not match the fitted values produced by the fitted() command. I will try to add the data set and code. – ajhubb Aug 04 '14 at 17:26
  • 1
    I wasn't sure how to add the data set, so I just rewrote my post with a reproducible example. Hope this helps – ajhubb Aug 04 '14 at 18:05
  • Thanks for doing that. Unfortunately, I don't know enough about time-series etc to help w/ this, but there are experts here. For now we can leave your Q here, but others w/ a clearer view of the issues may recommend migrating it to SO, if appropriate (the idea would just be to get you to where you can best be helped). – gung - Reinstate Monica Aug 04 '14 at 18:25

3 Answers3

9

You have misunderstood the model. It is not $$ y_t = ay_{t-1} + bx_t + e_t$$ as you assume. Rather it is \begin{align} y_t & = bx_t + n_t \\ n_t &= a n_{t-1} + e_t. \end{align} This is explained in the help file for arima:

If an xreg term is included, a linear regression (with a constant term if include.mean is true and there is no differencing) is fitted with an ARMA model for the error term.

There is further discussion comparing these two models on my blog at http://robjhyndman.com/hyndsight/arimax/.

Note: You appear to be using the forecast package, although this is not loaded in your code.

Rob Hyndman
  • 56,782
  • Thanks, Mr. Hyndman. I was able to find the solution to my problem and posted the solution above. – ajhubb Aug 05 '14 at 21:58
1

Thank you, Prof. Hyndman, for the answer. Thank you, ajhubb, for posting your solution based on Prof. Hyndman's comments. I'm working through manually rebuilding the values by hand as well.

(My apologies for posting this as an answer, but I don't have enough reputation to comment on your answer.)

When I ran the relevant snippet of code in ajhubb's answer, I got the following error:

Error in '+.default'(n_fit, f$coef[3] * f$xreg) : non-conformable arrays

It was caused by n_fit being a 49-element ts object, while f$xreg is a 50x1 matrix object. Here is the corrected snippet of code I ran:

 library(MASS)
 library(forecast)

 set.seed(1)
 N <- ts(mvrnorm(50, mu = c(0,0), Sigma = matrix(c(1,0.56,0.56,1), ncol = 2), empirical = TRUE), frequency = 12)
 f <- Arima(N[,1], order = c(2,0,0), xreg = N[,2], include.constant = FALSE) # Fits a linear regression with AR(2) error structure
 n <- f$x - f$coef[3] * f$xreg # Errors from linear regression component
 n_l1 <- lag(n, -1)
 n_l2 <- lag(n, -2)
 n_fit <- (f$coef[1] * n_l1) + (f$coef[2] * n_l2) 
 pred <-  n_fit[c(1:48)] + (f$coef[3] * f$xreg)[c(3:50)] # n_fit elements 1:48 are t=3,...,50; this matches f$xreg elements 3:50
 pred # View manually fitted values
 fitted(f)[c(3:50)] # View fitted values from f object
 round(pred, 14) == round(fitted(f)[c(3:50)], 14)
0

Thanks to Mr. Hyndman, here is the solution

    f = Arima(N[,1], order=c(2,0,0), xreg=N[,2], include.constant=FALSE)
    n=f$x-f$coef[3]*f$xreg #finds the AR error part of the model
    n_l1=lag(n,-1)
    n_l2=lag(n,-2)
    n_fit=n_l1*f$coef[1]+f$coef[2]*n_l2 #finds the fitted AR errors
    pred=n_fit+f$coef[3]*f$xreg #Gives correct fitted values  
ajhubb
  • 41
  • 1
  • 3
  • Although implementation is often mixed with substantive content in questions, we are supposed to be a site for providing information about statistics, machine learning, etc., not code. It can be good to provide code as well, but please elaborate your substantive answer in text for people who don't read this language well enough to recognize & extract the answer from the code. – Sycorax Feb 06 '21 at 18:02