What I am trying to do:
I am currently doing analysis on neuronal calcium imaging data.
In particular, I have two things:
- A time series that represents the amount of calcium within a neuron
- A boolean time series that encodes whether an activity from a mouse is taking place
I want to see if a specific neuron is activated when the defined action is taking place.
The method I want to use:
One technique I read in various paper consists in building a linear classifier on the calcium time series, converting it into a boolean array (1 if it is above the threshold, 0 if it is below). Then this boolean array from calcium is compared to the boolean array that encodes the activity of interest, computing a confusion matrix. This is done to see if the elevation in calcium concentration encodes for the activity.
In particular, we span all the possible thresholds (from a minimum to a maximum) for the calcium imaging data. From the various confusion matrices we can then build a ROC curve and use its area as a performance metric for that particular neuron.
The problem:
In the various papers they then wanted to see whether the results were statistical significant or if they were obtained by pure chance. They tested the significance by circularly permuting the calcium time series (they select a random index "i" of the timeseries, and inverted the timeserie before "i" with the timeseries after "i"). They claim to do the permutation in this way to better preserve the physiological structure of the timeseries.
The thing I do not understand is why this permutation test is applicable to timeseries data? I read about the exchangeability hypothesis that needs to be satisfied before applying this permutation method, but this to me does not seem to be the case... indeed the calcium sample after highly depends on the calcium sample before it. And even if we restrict our analysis to do a circular permutation we have a discontinuity point in the middle...
Questions:
- Is this analysis doable or is the exchangeability hypothesis holding this analysis back?
- If this analysis cannot be done, are there alternative ways to tests the significance if I do not know the underlying null distribution?
Reference article
https://pubmed.ncbi.nlm.nih.gov/31230711/
See “Analysis of Single Cell Responses During Behavior”