How find the "closest" (in a sense of data generating process) time series?

Question

Suppose we have overall $m$ time series, each with $n$ observations. We also have another time series with $n-k$ observations ($k>0$). Given the shortest series, I want to find from $m$ series those which have the closest data generating process (DGP). Motivation for this is to "recover" history for the shortest series from available series with longer history. Off the top of my head, I think of looking at correlation coefficients between series, or estimate some model (could be AR/ARMA) and look at coefficients. Is there any documented approach for this kind of exercise?

maaaaybe https://stats.stackexchange.com/questions/172439/comparing-clustering-time-series-with-unequal-lengths? — Alberto, Mar 28 '24 at 14:20
I think this is closer, but still does not answer fully my question -- https://stats.stackexchange.com/questions/19103/how-to-statistically-compare-two-time-series . — Sane, Mar 28 '24 at 14:27

score 2 · Accepted Answer · answered Mar 29 '24 at 05:39

2

You will have to define a class of data generating processes or the problem is not well-defined. If you are willing to restrict attention to invertible ARIMA processes for example, then Piccolo distance might do what you want.

See https://www.jstatsoft.org/article/view/v062i01 for a general discussion of distances between time series.

answered Mar 29 '24 at 05:39

Rob Hyndman

56,782

Thank you. I am quite new in time series clustering. One thing that I cannot comprehend -- why so called "closest" DGP is mostly understood in a sense of distance? Why should "closest" DGP imply some physical proximity (e.g., in DTW closest series are those which have close dynamic, as far as I understand). Series $X$ and $Y$ could have the same DGP, and are described, e.g., with AR(1) model with the same parameters, but $X$ is scaled version of $Y$. I believe time series clustering methods won't capture this. Isn't the correct approach to find DGP through estimating models and compare them? – Sane Mar 29 '24 at 09:44
Btw, I am a big fan of your book on Forecasting :) – Sane Mar 29 '24 at 09:46
1

I think you misunderstand how distance is being used here. "Closest" implies some kind of distance. I think you want distance between DGPs rather than distance between time series realisations. That's what Piccolo's distance is designed to do. – Rob Hyndman Mar 30 '24 at 00:47

How find the "closest" (in a sense of data generating process) time series?

1 Answers1