I have a stationary time series object (e.g.: xts) consisting of weekly continuous data. Values are missing for several weeks, sometimes randomly but often in chunks of 4-5 weeks. I want to fit a time series model to the data for forecasting using the "arima" function.
Does the function "arima" take into account the missing weeks? Looks like it does not use the time index at all! Here's my code:
o.time.pos <- seq(1:(52*5))
z.idx <- seq.Date(as.Date("2010/1/1"), by="week", length.out = 52*5)
sigma <- 1.15
phi <- 0.8
y_ts <- arima.sim(n = length(o.time.pos), list(ar = c(phi)), sd = sigma)
y.ts <- xts(as.numeric(y_ts), order.by=z.idx)
y.ts.na <- y.ts
# Missing values as NAs
y.ts.na[c(40:45, 72:82)] <- NA
ar1 <- arima(y.ts.na, order=c(2,1,2), method="ML")
y.ts.na1 <- y.ts
# Missing values are deleted from the time series. However, the time
# index shows that there are weeks missing
y.ts.na1 <- y.ts.na1[-c(40:45, 72:82)]
y.ts.na1
2010-09-10 -0.341071731
2010-09-17 -2.141615586
2010-09-24 -1.538637593
2010-11-12 -2.801102613
2010-11-19 -2.447482778
2010-11-26 -3.176720246
2010-12-03 -2.532530896
ar2 <- arima(y.ts.na1, order=c(2,1,2), method="ML")
I expect ar1 and ar2 to be same but they are not.
summary(ar1)
Call:
arima(x = y.ts.na, order = c(2, 1, 2), method = "ML")
Coefficients:
ar1 ar2 ma1 ma2
0.4367 0.1801 -0.6410 -0.3319
summary(ar2)
Call:
arima(x = y.ts.na1, order = c(2, 1, 2), method = "ML")
Coefficients:
ar1 ar2 ma1 ma2
0.5302 0.1231 -0.7473 -0.2298
It seems in the second method, even though the xts object has information on the missing weeks, the "arima" method does not seem to use this and instead treat the time as contiguous. In the above example, it seems to treat the data for 2010-11-12 as the data for the week after 2010-09-24 and so on. This is clearly wrong. I understand that putting together the likelihood with missing data is not possible. What are my options to fit a time series model to data with missing data?
I know one method is to impute the data (using, for example, How to use auto.arima to impute missing values or How do I handle nonexistent or missing data?) and then fit but is it possible to fit without imputation?
A related precursor step to ARIMA fitting is detrending of the data with missing values to get the residuals. We know that the popular "stl" and "decompose" functions do not accept missing values. What alternate methods do I have to detrend a time series with missing values? Currently I use simple imputation (e.g.: na.approx, na.spline etc.) to fill in the missing values before applying stl. Are there better alternatives?
– saipk May 15 '18 at 06:04stats::stlnorstats::decomposeproduce any kind of forecast (they do not define dynamics for the components), so I'm not sure how you hope to use them for that. The estimated components also depend on future information (so do your imputation schemes, by the way), so their observed dynamics are distorted. Detrending is not a "precursor" to fitting ARIMA models, you should be including the trend in your model in a single step (either as a difference or exogenous regressors). – Chris Haug May 15 '18 at 12:14