1

I have samples of bounded random variables $X,Y$ for time $t=1,2,\ldots T$. Denote them as $(x_1,y_1),(x_2,y_2),\ldots (x_T,y_T)$. Overall, the correlation between $X$ and $Y$, as calculated from the samples, is not high. But I suspect that they may be highly correlated in some subset of their definition, i.e. within a box $a\le X \le b, c \le Y \le d$ or some circle $\mathcal{C}$.

Is there an algorithm or procedure to zero down on such a region? To be more rigorous, if I specify the minimum size of the box or the circle mentioned above, it there any way to find the region where $X,Y$ have maximum correlation?

dexter04
  • 181
  • 1
    Yes. And you don't have to specify the size of the region. Simply draw a "crayon plot" as described at https://stats.stackexchange.com/a/18200/919 and look for intense monochromatic patches. I wrote a paper about this several years ago (derived from that post) but it was rejected for publication in TAS as being "too obvious." – whuber Aug 07 '23 at 13:20
  • 1
    Thanks @whuber. Great idea. I am looking for something more algorithmic. Is there any way to convert this into something that a machine can understand ? Need to run it over many time series. – dexter04 Aug 07 '23 at 18:38
  • To do that, you will need to frame your question in a more quantitative and specific manner, because currently it's really in an exploratory spirit -- namely, hunting for a subset that looks "highly correlated" in some sense. You can always find a perfectly correlated subset: just take any two values from your dataset. – whuber Aug 07 '23 at 18:48
  • @whuber I tried to do that by specifying that the box/circle over which the correlation is taken must be larger than some threshold. Any ideas how to do better? – dexter04 Aug 08 '23 at 09:19
  • That doesn't work because you have not specified any criterion for comparing correlations over different regions and you have only vaguely indicated what those regions might look like.. It wouldn't make much sense to compare correlation coefficients directly, because random variation in small regions will have greater effects than random variation in larger regions. We lack any sense of why you are doing this and how you might interpret the results. Providing that kind of guidance can motivate appropriate solutions. – whuber Aug 08 '23 at 15:06

0 Answers0