1

I have a c. 100,000 independent time series (they are not multivariate time series) of monthly retail sales of high street sales shops. They are independent because:

  • Shops are in different locations.
  • The time series run over different time periods in different years.

To find a common model that best fits them all (assuming sales follow a similar pattern across shops), this answer here suggests to apply a grid search across SARIMA possibilities (s=0,p=0,d=0,q=0) to for example (2,5,2,5). Then to apply a scale-independent error measurement indicator like MASE.

However, this doesn't work for my problem as:

  1. Independently estimating SARIMAX models will return different SARIMA (s,p,d,q) coefficients for each time-series. The final model should have the same coefficients across all time-series.
  2. Similarly the coefficients for the explanatory variables will be different across all time-series when (again) the final model should have the same coefficients across all time-series.

The brute force method here would be to grid search across SARIMA possibilities AND SARIMA possibilities' coefficients as well as explanatory variables' coefficients. This won't work on my machine of course.

Looking for help on how to approach this problem, preferably using Python packages, or if not possible to use Python then R.

Thank you.

  • 1
    (1) What are your time series? I find it hard to believe you have total store sales for 100,000 stores (no single retailer has so many stores), so I assume these are some SKU/store combination, or something. (2) Why do you want a common model order, let alone common parameter estimates? Even just the intercept should be different if your time series are on different levels. – Stephan Kolassa Jun 28 '23 at 11:12
  • Hi @StephanKolassa, the data comes directly from the payment processing companies. They are all small independent merchants.

    The reason why I want a common model order is because the purpose of fitting this model is to predict store sales for stores outside of those 100k in the data set. We may be facing a new store with only three months of data (but similar data available), and the idea is to give a rough estimate of where we think sales will be as well as what we think the confidence intervals will be for those predictions.

    – Alejandro Leno Jul 03 '23 at 14:07
  • OK, I see your point. To be quite honest, I would not so much look to (S)ARIMA in this case, but rather to some global model. For instance, you could fit a regression with characteristics of the store (square footage, type or what product/service they sell, ...), as well as monthly dummies and perhaps a trend - potentially including interactions between the predictors. This effortlessly handles input time series of different lengths, and you can immediately evaluate it for a prediction. Possibly use a mixed model to account for early sales. – Stephan Kolassa Jul 03 '23 at 16:08

0 Answers0