I am currently developing a cluster extent permutation test on time series data. For that I wanted to sanity test to see whether there are any biases with my test when it is ran on null (i.e., no effect between conditions) data or not. My approach is as follows:
True observed procedure
- I have 2 groups.
- Each group has n subjects and each subject has 25 datapoints.
- I run 25 ttests at each datapoint between groups.
- I get 25 p-values.
- If a p-value is smaller than 0.05, I mark it as significant.
- Significant p-values can form clusters.
- I define cluster size as number of adjacent datapoints with significant (first-level) p-values (cluster size = 0 for non-significant ps, cluster size = 1 if there's only one p that's significant and its adjacent ones are not).
Permutation procedure
- I shuffle group labels randomly, preserving original sample sizes
- I perform the (true observed) procedure described above
- I get cluster sizes as described above, extract the largest cluster size and store it in a distribution
- I repeat the process 10000 times
- I get a distribution of maximal cluster sizes under the null
Finally
I assign p values of my true observed clusters (true obs. procedure #7) as proportion of clusters equal to or larger than those stored in the distribution of maximal clusters under the null (permutation procedure #5)
My sanity test
I shuffle group labels before anything else, i.e., before running the first-level ttest (true observed procedure #3).
I run everything as described above and repeat the whole process with 200 random shuffles of my "true observed" data.
I look at "true observed" cluster's pvalues (let's call these cluster ps under the null [NOT meaning the maximum cluster sizes of the permutation loop!])
I observe these cluster ps under the null to be smaller than 1 5% of the time - this is good and makes sense, since under the null the first-level ttests should only come out significant 5% at the time.
My Question
How can I assess whether cluster sizes are appropriately often flagged as being significant? My intuition tells me that it should be (ps referring to "cluster ps under the null"):
(ps <= 0.05 / ps < 1) = "what should be 0.05 with a valid test"
That is, out of all clusters that were at least of size 1 (i.e., p<1), what was the proportion of those clusters that were assigned a significant p-values (i.e., p<=0.05).
Is my reasoning correct?
Thank you for your time and effort!
I have two groups with n participants / group. Each subject has 25 datapoints (I see that this was unclear in my post) I run separate ttests at each datapoint across groups.
I get cluster p values by computing the proportion of maximal cluster sizes under the null that are bigger than or equal to the true observed cluster size.
Let's say there were 3 consecutive ttests that were significant = cluster size = 3. How many of the 10k biggest clusters under the null were equal to or larger than 3 => p value of my cluster
– Mah1510 Jan 31 '24 at 08:13