3

I have weekly time series data, which looks like follows:

enter image description here

The data seems to be non-stationary. Then I took the first difference of the data. Now the data seems to be more stationary.

![enter image description here

After that, I used the auto.arima function to first differenced data to find the best model. It was suggested a seasonal ARIMA model.

Best model: ARIMA(0,0,0)(1,0,0)[52] with zero mean

However, as you can see based on the time series plot for the first differenced data, there seems to be no seasonal trend in the data.

So what should I do now ? Should I follow the results of auto.arima? Or else, are there any alternative path that I can follow ?

To check whether there is any seasonality in the first differenced data, I again took the seasonal difference of the first differenced data(differencing the first differenced data by taking lag=52). Following is the time series plot for the seasonal differenced data of the first differenced data.

enter image description here

For me, there is no any significant difference between the time series plot of the first differenced data and the time series plot of the seasonal differenced data of the first differenced data.

*Update

This is the plot that I obtained using seasonplot function in foreacast package for first differenced data. enter image description here

There seems to be a slight peak around week 24-25. But I am not sure whether it is significant enough to go with a seasonal model.

Also, I have updated the question by posting some results based on the original data(non differenced data).

The auto.arima model also suggested a seasonal arima model for original data.

 Best model: ARIMA(0,1,0)(1,0,0)[52]

This is the seasonplot plot for the original data.

enter image description here

There is some peak around week 12-13. Will this results enough to go with a seasonal arima model?

Any advice would be highly appreciated.

Thank you.

  • 2
    Consider supplying original (non-differenced) data to auto.arima. However, as you can see based on the time series plot for the first differenced data, there seems to be no seasonal trend in the data. It is actually quite hard to see. Consider including a seasonal plot; there is a function seasonplot or ggseasonplot for it in the forecast package. – Richard Hardy Oct 07 '20 at 05:10
  • @RichardHardy Hi, is it okay, since the data is non-stationary? – Sam88 Oct 07 '20 at 05:13
  • 1
    @Sam88, it is OK because auto.arima does the differencing for you based on a smart algorithm. – Richard Hardy Oct 07 '20 at 05:16
  • @RichardHardy thanks. I am also new to time series forecasting. Do you recommend to do machine learning techniques like rolling cross validation to find the time series model parameters ? – Sam88 Oct 07 '20 at 05:21
  • And if you have daily data, see also this regarding long seasonal periods. – Richard Hardy Oct 07 '20 at 05:44
  • 1
    "However, as you can see...for the first differenced data, there seems to be no seasonal trend in the data..."---that's a loose statement made from eyeballing the data. The suggested SARIMA(0,0,0)(1,0,0)[52] model says the series is made up of 52 constituent AR(1) series, one for each week, with a lag of one year. What happens this week depends on what happened the same week last year, plus a disturbance. You may want to consider whether you have clear reason to discount this possibility. A sample simulated from SARIMA(0,0,0)(1,0,0)[52] model would not look very different from your data. – Michael Oct 07 '20 at 05:56
  • "I...took the seasonal difference of the first differenced data..."---there's no reason to do this unless you believe the series is seasonally integrated, which does not seem to be the case here for your first differenced data. (To check the SARIMA(0,0,0)(1,0,0)[52] fit, one would apply $1-\phi L^{52}$ to the series, where $\phi$ is the SAR(1) coefficient, then check whiteness of the resulting series.) – Michael Oct 07 '20 at 06:07
  • @RichardHardy: do you want to post your comment(s) as an answer? Better to have a short answer than no answer at all. Anyone who has a better answer can post it. – Stephan Kolassa Oct 07 '20 at 07:01
  • @Michael: do you want to post your comment(s) as an answer? Better to have a short answer than no answer at all. Anyone who has a better answer can post it. – Stephan Kolassa Oct 07 '20 at 07:01
  • @RichardHardy I have updated the question with the seasonal plots. Please have a look if you a time. – student_R123 Oct 07 '20 at 13:59
  • @Michael You have made a very good point. I never thought like that. I have updated the question with the seasonal plots. Do you think what we saw from those plots will be enough to go with a seasonal arima model. ? – student_R123 Oct 07 '20 at 14:02
  • @StephanKolassa I fitted a Seasonal Arima model for this data and the prediction accuracy was very bad. I also fitted a random forest model with lag price (lag=1) as the predictor. The prediction results was very good. Doe the use of random forest model in time series is appropriate ? – student_R123 Oct 13 '20 at 04:02
  • If it works, it works. Did you assess prediction accuracy in-sample or out-of-sample? And which accuracy metric are you using? – Stephan Kolassa Oct 13 '20 at 05:24
  • @StephanKolassa Yeah. I used MSE as the metric. In fact I posted another question regarding the comparison of two models. Please have a look if you have time. https://stats.stackexchange.com/questions/491871/comparing-a-arima-model-with-random-forest-model-for-time-series-data – student_R123 Oct 14 '20 at 05:04
  • MSE is good. Are you using a holdout sample? – Stephan Kolassa Oct 14 '20 at 05:18
  • @StephanKolassa Holdout means a validation set isn't it ? Yeah I trained the model using first 200 observations and evaluated it from the rest of the observations – student_R123 Oct 17 '20 at 00:35

0 Answers0