I have a c. 100,000 independent time series (they are not multivariate time series) of monthly retail sales of high street sales shops. They are independent because:
- Shops are in different locations.
- The time series run over different time periods in different years.
To find a common model that best fits them all (assuming sales follow a similar pattern across shops), this answer here suggests to apply a grid search across SARIMA possibilities (s=0,p=0,d=0,q=0) to for example (2,5,2,5). Then to apply a scale-independent error measurement indicator like MASE.
However, this doesn't work for my problem as:
- Independently estimating SARIMAX models will return different SARIMA (s,p,d,q) coefficients for each time-series. The final model should have the same coefficients across all time-series.
- Similarly the coefficients for the explanatory variables will be different across all time-series when (again) the final model should have the same coefficients across all time-series.
The brute force method here would be to grid search across SARIMA possibilities AND SARIMA possibilities' coefficients as well as explanatory variables' coefficients. This won't work on my machine of course.
Looking for help on how to approach this problem, preferably using Python packages, or if not possible to use Python then R.
Thank you.
The reason why I want a common model order is because the purpose of fitting this model is to predict store sales for stores outside of those 100k in the data set. We may be facing a new store with only three months of data (but similar data available), and the idea is to give a rough estimate of where we think sales will be as well as what we think the confidence intervals will be for those predictions.
– Alejandro Leno Jul 03 '23 at 14:07