0

I have a dataset of almost 10,000 shops, 'dates', 'shop ID' and 'sales amounts' as their features almost 2 years of data. I want to forecast each shop, the sales amount for 30 next days. I want to perform this on a CPU (laptop), no cloud, and no GPU. I can do this if there are 10 to 20 shops, but I am not getting how to do this for 10,000 shops.

I am adding an example dataset here.

enter image description here

The above fig shows shop ID, date and sale amount columns. Now for each shop ID there are almost 2 years of data. And there are such 10,000 shops.

Current situation: I have aggregated all the shop's data over date and predicted the next 30 days using ARIMA. But not getting any idea how to do it at each shop level.

Please help.

Thank you.

R June
  • 123
  • 5
  • I feel that your question is a bit unclear, can you provide a sample of what your data looks like, and maybe a code snippet of what you've tried so far? – Stefan Popov Dec 16 '22 at 21:04
  • What would you do if you had access to those computational resources or if you only had a small number of shops for which you had to make forecasts? – Dave Dec 16 '22 at 21:23
  • If I had cloud resources, probably I will ARIMA model that I already trained ion whole dataset. – R June Dec 17 '22 at 00:35
  • I have edited the question probably now it can explain my situation. – R June Dec 17 '22 at 02:07

1 Answers1

1

Predicting 30 days for 10000 without GPU nor CPU is difficult, but you can try different approaches to get the most out of the data and make consistent predictions:

  • Start with a correlation map to find out similar shops. If you find groups of shops with similar behavior, you may just have to train a model for each group.
  • In addition to that, you could visualize 10000 shops using dimensional reduction algorithms like UMAP and detect relevant clusters. This is also interesting to extract features and dependencies of other important variables that could help you improve predictions. I recommend to start with 1000 shops to have the right parameters and prepare the data correctly.
  • Once you have clear clusters, you can apply ARIMA to some of them as they probably have seasonality (= where ARIMA performs the best).
  • For the shops out of clusters, you may apply to them another algorithm for noisy data. Potentially Exponential Smoothing or Random Forest.

In conclusion, starting from a general approach to group shops with similar data, extracting knowledge from them, and applying a prediction model for each group could be a good solution in case of low computing power.

Nicolas Martin
  • 4,674
  • 1
  • 6
  • 15