3

I have the following sample of a big data frame:

Time (ms)  Signal_1  Signal_2
0          0         0
1          0         0
2          0         1
3          0         0
4          1         0
5          0         0
6          0         0
.          .         .
.          .         .
.          .         .
996        1         1
997        0         0
998        0         0

Signal_1 represents if a heart beat occurred in person X in Time i.

Signal_2 represents if a heart beat occurred in person Y in Time i.

Time (ms) is the Time i and the index of the data frame. Time = 0 represents the begin of the experiment. Time = 1000 represents the first second passed after the begin of the experiment.

Since the signals are nominal (boolean), how can I use VAR and Granger Causality to say if Signal_1 causes Signal_2?

Is there any way to calculate correlation between these binary time series data?

Alexis
  • 29,850

2 Answers2

0

For binary data correlation does not suit well, but there are many similarity indexes that you can use like the Jaccard index.

Galen
  • 8,442
  • Provided that neither variable is degenerate, the Pearson's correlation is defined on binary variables. Numerator is the independence gap and the denominator normalizes the scale according to the Cauchy-Schwarz inequality. It isn't inherently problematic to compute correlations on binary data, but does require some attention towards the mathematics. – Galen Aug 25 '23 at 15:43
0

Causal Models

Supposing causal sufficiency, Markov property, and faithfulness, there are some simple options to get started with. You can expand on these samples by having causes that jump across multiple time points, but I have not shown these here. You could also suppose that each variable is not a cause of its next value but I find that unlikely in practice (go ahead and explore that option if it is plausible in this case).

enter image description here

You could use do-calculus to design an experiment to further investigate which of these models appears to be correct.

Correlation

You could compute a correlation score. I think computing the covariance (which for binary variables is the independence gap) is simpler and has bounds $\text{Cov}[S_1, S_2] \in \left[ -\frac{1}{4}, \frac{1}{4}\right]$.

Galen
  • 8,442