1

I have been workin with time-series data. I haven't been able to find any way of analysing automatically if a given time-series has a seasonal behaviour (when I say automatically, I mean in a way I can program an algorithm to take the time-series as an input and return a True or False, instead of someone having to manually analyse the graphical representation).

Does anyone have any code suggestions (preferably in Python) or references to good papers on that matter? If it is applied to big data even better, but if not anything will help.

Johanna
  • 583
  • What the paper that Stephan Kolassa referenced to did - is calculate the ACF of the detrended series and take the 1st peak as the frequency (if it's at least 0.1 bigger than the trough before/after it). So that is something that you should be able to code. What Ben suggested in the other thread is to move to frequency domain - I guess you could decide here too if the coefficient for a certain frequency is big enough to take that as a seasonality. – Maverick Meerkat Dec 05 '22 at 14:54

1 Answers1

1

Section 3.2 in the following paper offers a possibility for determining the length of the seasonal cycle:

 Wang, X, Smith, KA, Hyndman, RJ (2006) "Characteristic-based
 clustering for time series data", _Data Mining and Knowledge
 Discovery_, *13*(3), 335-364.

However, note that this was never included in the forecast::auto.arima() function (whose author is Hyndman), although this does use other methods from that paper (for instance, auto.arima() decides whether to apply seasonal differencing for known seasonal cycle length based on an estimate of seasonal strength as also given in Wang et al.).

I do not now why this was never included. It may have been because it was unstable, varying and hard to automate. After all, you need to identify peaks and troughs in the ACF, and what constitutes a "peak" or a "trough" in a noisy ACF series would need to be operationalized.

Alternatively, perhaps there simply never was any demand for it, since users presumably know their seasonal cycle length.

So if you want to use the cycle length determination per Wang et al., you would need to code it yourself.

Stephan Kolassa
  • 123,354
  • Is there a way to extract the seasonality ($m$) number from the auto.arima output? I couldn't find it – Maverick Meerkat Dec 05 '22 at 14:48
  • @MaverickMeerkat: you would typically just extract that from the time series you fed into auto.arima(), using frequency(). Alternatively, you can of course query the in-sample fit from your model and extract the frequency from that: first model <- auto.arima(...), then frequency(model$fit). – Stephan Kolassa Dec 05 '22 at 16:10
  • that does not seem to work unless the data already has a known frequency. The question is about unknown frequency. If I strip the data from this meta-information then frequency(data) or frequency(fit) will be 1. – Maverick Meerkat Dec 05 '22 at 19:26
  • @MaverickMeerkat: Well, yes. I obviously misunderstood what you were looking for. auto.arima() does not detect or fit the seasonal cycle length, it's an input parameter. "Detecting" this would also be quite hard to do based on the data alone. You can probably do something using Fourier transformations. But I would consider "detecting" seasonality in a series rather strange. You should really have enough understanding of your domain to be able to set the seasonal cycle length. – Stephan Kolassa Dec 05 '22 at 19:51
  • I'm confused. Your answer quotes a paper that explains exactly how auto.arima estimates the seasonal cycle (using the 1st peak of the ACF of the detrended series, p.342 section 3.2). Also, when I use the stripped data and fit it using auto.arima, I get a clearly seasonal forecast. – Maverick Meerkat Dec 05 '22 at 20:09
  • Here's a reproducible code: library(fpp3) library(forecast) aus_holidays <- tourism %>% filter(Purpose == "Holiday") %>% summarise(Trips = sum(Trips)/1e3) xx = aus_holidays$Trips fit <- auto.arima(xx) frequency(fit) plot(forecast(fit,h=20)) abline(v=81:100) – Maverick Meerkat Dec 05 '22 at 20:35
  • @MaverickMeerkat: thank you, I think you found a glaring error in my answer above. Specifically, the Wang, Smith & Hyndman (2006) paper describes two seasonal topics: detecting the cycle length in section 3.2 (which you refer to), and a measure of seasonal strength given a cycle length. The documentation to auto.arima() refers to this paper, but the reference is not to the determination of cycle length, but to the measurement of seasonal strength. ... – Stephan Kolassa Dec 05 '22 at 20:45
  • ... Based on the changelog, it also does not look like this functionality was removed at some point. It simply never was there. As to why this functionality was never added, I don't know. It may have been because it was unstable, varying and hard to automate. Or perhaps there simply never was any demand for it, since users presumably know their seasonal cycle length. So if you want to use the cycle length determination per Wang et al., you would need to code it yourself. I'll edit my answer. – Stephan Kolassa Dec 05 '22 at 20:47
  • Yes, I tried now with different data and seasonality of 12, and it failed to produce seasonal forecasts. So, I guess it's not really estimating the seasonal period. – Maverick Meerkat Dec 05 '22 at 21:24