1

Given daily ticket data from feb 2022 to July 30 2022, I am trying to forecast tickets received. I am unsure on which model to use (Arima, Sarima,SSE) based on my data. I have attached a picture of the trend of the data. The first 130 days or so are for the training, the rest (about 40 days) are for the test data, which I will make predictions on.

Overall, I am just pretty confused on the process on finding the (ar,i,m) parameters for model. I think that may be why the predictions are not very accurate. Also, does this data need to be stationarized?

Ticket count data(y): [[ 66] [ 60] [ 76] [ 86] [ 37] [ 23] [ 38] [110] [ 82] [ 58] [ 92] [ 48] [ 5] [ 45] [ 63] [ 71] [ 49] [ 69] [ 52] [ 11] [ 22] [ 44] [ 35] [ 74] [ 76] [ 32] [ 31] [112] [ 53] [ 47] [ 39] [ 55] [ 33] [ 20] [ 60] [ 61] [ 39] [ 46] [ 62] [ 24] [ 11] [ 24] [ 28] [ 36] [ 17] [ 19] [ 18] [ 10] [ 32] [ 26] [ 29] [ 24] [ 27] [ 47] [ 5] [ 46] [ 24] [ 40] [108] [ 68] [ 77] [ 11] [ 13] [ 20] [ 32] [ 22] [ 55] [ 46] [ 6] [ 40] [ 36] [ 34] [ 75] [ 39] [ 37] [ 30] [ 64] [ 67] [ 47] [ 63] [ 33] [ 35] [ 2] [ 42] [ 45] [ 30] [ 29] [ 16] [ 20] [ 12] [ 33] [ 50] [ 67] [109] [ 27] [ 6] [ 3] [ 5] [ 73] [ 80] [ 58] [ 30] [ 59] [ 77] [ 60] [111] [ 38] [ 43] [ 8] [ 35] [ 95] [ 68] [ 42] [ 73] [ 17] [ 19] [ 64] [ 14] [ 72] [129] [ 73] [ 28] [ 5] [ 24] [ 90] [ 89] [ 29] [ 63] [ 18] [ 6] [ 28] [ 47] [ 30] [ 35] [109] [ 87] [ 4] [ 56] [ 24] [ 13] [ 28] [ 77] [ 55] [ 67] [ 36] [ 54] [ 70] [129] [ 59] [ 78] [ 15] [ 77] [116] [129] [129] [129] [ 56] [ 32] [125] [ 86] [129] [129] [129] [ 83] [ 69] [104] [ 91] [ 35] [ 33] [ 25] [ 45] [ 2] [ 24] [ 42] [ 75] [ 53] [ 35] [ 32]]

enter image description here

ty101
  • 31

1 Answers1

0

Don't fit an ARIMA model by hand. It's much better to rely on a tested software implementation, like the forecast and fable packages for R.

library(forecast)
tickets <- ts(c( 66, 60, 76, 86, 37, 23, 38,110, 82, 58, 92, 48, 5, 45, 63, 71,
49, 69, 52, 11, 22, 44, 35, 74, 76, 32, 31,112, 53, 47, 39, 55, 33, 20, 60, 61,
39, 46, 62, 24, 11, 24, 28, 36, 17, 19, 18, 10, 32, 26, 29, 24, 27, 47, 5, 46, 24,
40,108, 68, 77, 11, 13, 20, 32, 22, 55, 46, 6, 40, 36, 34, 75, 39, 37, 30, 64, 67,
47, 63, 33, 35, 2, 42, 45, 30, 29, 16, 20, 12, 33, 50, 67,109, 27, 6, 3, 5, 73, 
80, 58, 30, 59, 77, 60,111, 38, 43, 8, 35, 95, 68, 42, 73, 17, 19, 64, 14, 72,129, 
73, 28, 5, 24, 90, 89, 29, 63, 18, 6, 28, 47, 30, 35,109, 87, 4, 56, 24, 13, 28, 
77, 55, 67, 36, 54, 70,129, 59, 78, 15, 77,116,129,129,129, 56, 32,125, 
86,129,129,129, 83, 69,104, 91, 35, 33, 25, 45, 2, 24, 42, 75, 53, 35, 32), 
frequency=7)

train <- ts(tickets[1:130], frequency=7, start=c(1,1)) test <- ts(tickets[130:length(tickets)], start=c(19,5), frequency=7) seasonplot(train)

seasonplot

Don't take the labels on the horizontal axis seriously. They are very probably off, since I don't know on what precise date your series starts. The point is that the likely weekly seasonality of your ticket sales is not overly blatant, but one day ("Friday" in the plot, but per above, that may well be a different day, probably Sunday) does show systematically lower sales. So there is probably some seasonality in here.

Fit a model using auto.arima(). This searches through models based on seasonality tests and information criteria. It is the gold standard for automatic time series forecasting. I believe that there is a reimplementation in the pmdarima package for Python.

model_arima <- auto.arima(train)
fcst <- forecast(model_arima, h=length(test))
plot(fcst_arima)
lines(test,col="red",lwd=2)

ARIMA forecast

auto.arima() fits a SARIMA model, with a decaying seasonal forecast. This makes sense, but it is not very good - as we see, the actuals leave the prediction intervals quite often. Here are accuracy measures:

> accuracy(fcst_arima, test)
                     ME     RMSE      MAE       MPE      MAPE      MASE        ACF1 Theil's U
Training set -0.5000957 23.73706 19.00397 -64.26156  89.02182 0.8079806 -0.01727246        NA
Test set     17.2089423 43.91135 35.64475 -75.64584 128.12766 1.5154873  0.50709598 0.6938199

Since your series is very noisy and the prediction intervals indeed go below zero (nonsensically), I would completely disregard the MAPE and rather trust the RMSE.

We can also fit an exponential smoothing model:

model_ets <- ets(train)
fcst_ets <- forecast(model_ets, h=length(test))
plot(fcst_ets)
lines(test,col="red",lwd=2)

ETS forecast

This looks a little more sophisticated. The seasonal forecast does not decay, as the ARIMA one does. However, the accuracy is almost exactly the same:

> accuracy(fcst_ets, test)
                    ME     RMSE      MAE       MPE      MAPE      MASE      ACF1 Theil's U
Training set -0.792637 25.35551 19.49958 -51.77954  77.60262 0.8290523 0.2308526        NA
Test set     16.393501 43.86697 36.97665 -72.40524 129.73652 1.5721147 0.4747224 0.8154524

It looks like your series is just not very forecastable. If you get explanatory information on what drives ticket sales, you may be able to improve on these pure time series methods. How to know that your machine learning problem is hopeless?

The tag wiki for the forecasting tag contains pointers to both introductory and advanced literature, all of it freely accessible.

Stephan Kolassa
  • 123,354
  • Thank you. I am running forecasting packages on Python, but I appreciate your response using R. In terms of the data being stationary, do I need to difference it or log it before hand to make it stationary, then feed it into an auto_arima() function to correctly find the right parameters? How did you happen to get (0,0,1)(2,0,0)7 for the SARIMA model? Also, would obtaining more data (maybe two years worth) help this issue of inaccuracy ? – ty101 Aug 02 '22 at 22:22
  • A decent automatic ARIMA fitting method will take care of the standard nonstationarities you see in time series data. For instance, the forecast::auto.arima() function will automatically detect seasonality and trend and perform a Box-Jenkins transformation if necessary (and also take care of the back-transformation, which is nontrivial!). I simply applied it to your data and got the (0,0,1)(2,0,0)[7] form with a nonzero mean, and the parameter estimates. I recommend you take a look at pmdarima and similar Python packages... – Stephan Kolassa Aug 03 '22 at 06:22
  • ... More data may be helpful, but to be honest, your series looks so highly noisy that I am not optimistic. If you do get multiple years' worth of history, take a look at methods for multiple seasonalities, although to the best of my knowledge, these have not been ported to Python, in contrast to standard automatic ARIMA modeling. – Stephan Kolassa Aug 03 '22 at 06:24
  • 1
    Nixtla ported autoarima to python: https://nixtlaverse.nixtla.io/statsforecast/docs/models/autoarima.html – Evgeniy Riabenko Mar 08 '24 at 14:43
  • @EvgeniyRiabenko: thank you, that is good to see! – Stephan Kolassa Mar 08 '24 at 14:52