3

I would like to forecast the car rental (count time series). Given hourly integer valued car rentals for a month's period from 24th september to 24th October. I need to forecast car rental demand from 25th october to 31st October.

The given data is shown below:

enter image description here

The Ground Truth forecast data is given as below:

enter image description here

I will describe my approach below. Please correct me and give your suggestions.

I am using the tscount package from R. https://cran.r-project.org/web/packages/tscount/index.html to model this data using a negative binomial distribution having a conditional mean (time varying).

enter image description here

It is a linear combination of logs of past means and past observations as shown below.

enter image description here

Note: The shape parameter, $\phi$ controls the dispersion/variance of the neg-binomial distribution but it is NOT time varying.

First I computed the ACF of the given data and it is shown below:

enter image description here

I save the lags of the 10 largest and 10 smallest ACF values.

Similarly, I plot the PACF of the given data and extract the lags due to 10 largest and smallest PACF values. Then I take the unique lags from the combination of ACF and PACF lags. It represents p in the above equation. I am setting p = q.

enter image description here

Then I estimate the parameters of the model using

ts_fit_175_mon1 = tscount::tsglm(ts=ts175_mon1, link='log', model =
    list(past_obs = reg_lags, past_mean = reg_lags),distr = 'nbinom')

summary(ts_fit_175_mon1) tscount::scoring(ts_fit_175_mon1)

The resulting fit on the given data is given in orange and scoring rules are given below. enter image description here

enter image description here

The forecast is given below in orange

enter image description here

enter image description here

I would like to understand what are the problems with this forecast and how to improve it? why is the predicted peak much higher? some of the peaks in the ground truth are not covered. What is being missed in the forecast?

I tried to increase the number of paramters by including 20 largest and smallest lags from ACF and PACF plots leads to increased peaking and worse scoring values. How to better make use of ACF and PACF plots?

when should I be satisfied with my forecast? When I achieve the least scoring value for a model?

Is there any step that I am missing before modelling?

Thanks.

1 Answers1

2

Your ACF has strong spikes at lags 24, 48, 72, 96 etc... which is not surprising, since this is driven by intradaily seasonality. Consider modeling this, e.g., by including lag 24.

In general, I would start out with a much simpler model. Start with a seasonal model as above, and nothing else. Then add a single lag 1. Then a lag 2. In forecasting, very simple models suprisingly often outperform more complex ones. Compare your model to a seasonal naive model (which simply forecasts the last observation for the corresponding hour in the future). Because of the seasonality, I would not try the even simpler nonseasonal naive model. Also, try fitting an exponential smoothing model. Yes, it's misspecified for count data... but it may still be competitive.

You may also have in your data, with weekends differing from weekdays. The tag wiki contains pointers to models for such cases. Again, such a more complex model will not necessarily yield better forecasts.

why is the predicted peak much higher?

To find this out, you would need to recalculate the forecast based on the parameter estimates. And then you would need to understand just why the parameter was estimated the way it was. In general, strongly varying forecasts without underlying causal drivers hint at overfitting. Simplify your model.

some of the peaks in the ground truth are not covered.

That is not suprising. Predictions should vary less than observations, because modeling tries to separate explainable from unexplainable variation and only forecasting the explainable part - the rest is noise. Peaks in the observations are evidence for unexplained noise. Unless you have causal drivers that allow you to explain and predict them, of course.

when should I be satisfied with my forecast?

How to know that your machine learning problem is hopeless?

Stephan Kolassa
  • 123,354