Fitting ARIMA to time series with missing values

Question

I have a stationary time series object (e.g.: xts) consisting of weekly continuous data. Values are missing for several weeks, sometimes randomly but often in chunks of 4-5 weeks. I want to fit a time series model to the data for forecasting using the "arima" function.

Does the function "arima" take into account the missing weeks? Looks like it does not use the time index at all! Here's my code:

o.time.pos <- seq(1:(52*5))
z.idx <- seq.Date(as.Date("2010/1/1"), by="week", length.out = 52*5)
sigma <- 1.15
phi <- 0.8
y_ts <- arima.sim(n = length(o.time.pos), list(ar = c(phi)), sd = sigma)
y.ts <- xts(as.numeric(y_ts), order.by=z.idx)
y.ts.na <- y.ts
# Missing values as NAs
y.ts.na[c(40:45, 72:82)] <- NA
ar1 <- arima(y.ts.na, order=c(2,1,2), method="ML")
y.ts.na1 <- y.ts
# Missing values are deleted from the time series. However, the time 
# index shows that there are weeks missing
y.ts.na1 <- y.ts.na1[-c(40:45, 72:82)]
y.ts.na1
2010-09-10 -0.341071731
2010-09-17 -2.141615586
2010-09-24 -1.538637593
2010-11-12 -2.801102613
2010-11-19 -2.447482778
2010-11-26 -3.176720246
2010-12-03 -2.532530896
ar2 <- arima(y.ts.na1, order=c(2,1,2), method="ML")

I expect ar1 and ar2 to be same but they are not.

summary(ar1)

Call:
arima(x = y.ts.na, order = c(2, 1, 2), method = "ML")

Coefficients:
         ar1     ar2      ma1      ma2
      0.4367  0.1801  -0.6410  -0.3319
summary(ar2)

Call:
arima(x = y.ts.na1, order = c(2, 1, 2), method = "ML")

Coefficients:
         ar1     ar2      ma1      ma2
      0.5302  0.1231  -0.7473  -0.2298

It seems in the second method, even though the xts object has information on the missing weeks, the "arima" method does not seem to use this and instead treat the time as contiguous. In the above example, it seems to treat the data for 2010-11-12 as the data for the week after 2010-09-24 and so on. This is clearly wrong. I understand that putting together the likelihood with missing data is not possible. What are my options to fit a time series model to data with missing data?

I know one method is to impute the data (using, for example, How to use auto.arima to impute missing values or How do I handle nonexistent or missing data?) and then fit but is it possible to fit without imputation?

score 11 · Accepted Answer · answered May 14 '18 at 23:55

The results given by stats::arima in the first approach (ar1) are correct: they have taken into account the missing values. In the second one, they have not.

You can fit ARIMA models with missing values easily because all ARIMA models are state space models and the Kalman filter, which is used to fit state space models, deals with missing values exactly by simply skipping the update phase. So, "putting together the likelihood with missing data" is absolutely possible, as is done by the Kalman filter. Any other state space model will allow you to do the same.

Unless you are specifically interested in an estimate of those missing values, you do not need to impute them. If you do so incorrectly, you could distort the dynamics, which would cause problems when trying to fit your model afterwards. If you only want to forecast the series, you should probably not impute them.

The question of why ar1 is correct but not ar2 is not exactly on topic here, but for the record: stats::arima expects your data as an object of class ts, not xts. If your data isn't ts, it will be converted by using as.ts, which discards the date information; this means that the explicit NA's in the first approach are retained, while the implicit ones in the second will not appear at all and it will indeed just glue the series together. The reason why stats::arima expects an object of class ts is because that class enforces regularly sampled data (at a certain frequency), whereas xts can carry arbitrarily sampled data, and classical ARIMA models are defined for regularly sampled data only.

Thank you for your reply.
A related precursor step to ARIMA fitting is detrending of the data with missing values to get the residuals. We know that the popular "stl" and "decompose" functions do not accept missing values. What alternate methods do I have to detrend a time series with missing values? Currently I use simple imputation (e.g.: na.approx, na.spline etc.) to fill in the missing values before applying stl. Are there better alternatives? — saipk, May 15 '18 at 06:04
That seems like a separate question, but neither stats::stl nor stats::decompose produce any kind of forecast (they do not define dynamics for the components), so I'm not sure how you hope to use them for that. The estimated components also depend on future information (so do your imputation schemes, by the way), so their observed dynamics are distorted. Detrending is not a "precursor" to fitting ARIMA models, you should be including the trend in your model in a single step (either as a difference or exogenous regressors). — Chris Haug, May 15 '18 at 12:14
Sure, including the trend in the model and fitting in single step would be good. Let's say I just want to obtain the trend or a smoothed time series. How to compute that when the time series has missing data? Are there alternate methods to stl/decompose? — saipk, May 15 '18 at 17:02
You'd be better off asking a new question, but yes, any state space model that has some component that can be interpreted as the trend will work with missing data. — Chris Haug, May 15 '18 at 22:50

Fitting ARIMA to time series with missing values

1 Answers1

Linked