0

I have a highly seasonal data of a little more than 1000 observations, with a 67 seasonal cycle. I am using auto_arima in Python and for some reason when I perform a stepwise search I get a memory error after the stepwise search has fitted a few models. Why could this be happening?

Data plot with an earlier ARIMA model with Fourier terms I’ve used- https://i.stack.imgur.com/7TzeF.jpg

Tempo
  • 1
  • 1
    Please don't simply repost a deleted question, especially not if it contained crucial information. Your plot shows a quite competitive forecast, if all you have is the time series itself, except that the lowest points are badly over-forecasted. Your best bet is likely not to tweak ARIMA, but to invest any domain knowledge you have. If you can share any more information here, we may be able to help you. See https://stats.stackexchange.com/q/222179/1352 – Stephan Kolassa Apr 19 '23 at 12:26
  • Sorry for reposting the question, I thought my older question wasn’t very concise. The series is of dropout rates from an institution by year, and every year has 67 records. This is the domain knowledge I have and so I was trying to implement a SARIMA model with seasonality of 67. I am not sure why auto_arima fails to compute mid way through the process. – Tempo Apr 19 '23 at 13:03
  • Hm. Why does every year have 67 records? A year has 52 or 53 weeks, so these can't be weeks... I find that rather strange. ARIMA is known for having problems with "long" seasonality, and one would indeed typically address this using harmonics or other predictors. In your case, for the very low points, Boolean predictors might be useful, essentially in a regression with ARIMA errors. – Stephan Kolassa Apr 19 '23 at 13:08
  • Yes it’s not really a timeseries, just data of observations that has a cycle in its behavior. Is 67 considered a long seasonal period? I thought that if arima could handle a weekly period a 67 period should do alright. I will read on Boolean predictors, thanks – Tempo Apr 19 '23 at 13:54
  • 67 is borderline to long. 365 (daily data, yearly seasonality) is definitely long, 48 (half-hourly data, daily seasonality) or 52 (weekly data, yearly seasonality) can already be long, see Hyndman's post. ARIMA works reasonably well for cycle lengths of 4 (quarterly data, yearly seasonality), 7 (daily data, weekly seasonality) or 12 (monthly data, yearly seasonality). 24 (hourly data, daily seasonality) is already iffy. – Stephan Kolassa Apr 19 '23 at 13:58
  • While I agree with Stephan Kolassa's comments, I wonder if the Python implementation of ARIMA handles memory efficiently. Have you tried R instead? – Richard Hardy Apr 19 '23 at 14:03
  • @RichardHardy I am not able to download it in my work space.. I will try using R at home. – Tempo Apr 19 '23 at 14:06
  • (Not to say this will help solve the actual problem, as Stephan Kolassa has made some valid points. It is more about the curiosity of finding out whether Python is as efficient as R in this respect.) – Richard Hardy Apr 19 '23 at 14:09
  • @RichardHardy i understand, still I am curious to try it as well – Tempo Apr 19 '23 at 14:11
  • @StephanKolassa When I limit the stepwise search with pmdarima's StepwiseContext the stepwise search is shorter but finds a model and doesn’t run out of memory. Is that a valid solution? – Tempo Apr 20 '23 at 08:25
  • I assume this does the greedy search for an optimal model. Yes, I would say that is perfectly valid - running through all possible ARIMA models usually takes very long, and the greedy approach usually is perfectly fine. It's the default setup in the R implementation in the forecast and fable packages, which I would regard as the gold standard. – Stephan Kolassa Apr 20 '23 at 08:35
  • @StephanKolassa it does indeed work very well. I am wondering why limiting the stepwise search is solving the memory issue. Do you happen to know the reason a longer search would be much more demanding on my machine? – Tempo Apr 20 '23 at 13:48

1 Answers1

0

Limiting the stepwise search duration with pmdarima's StepwiseContext solved the memory error.

Tempo
  • 1