Building a bootstrapped distribution to test reliability of empirical data

Question

I have data from a study where subjects (considered as "generations" in a Telephone-like game) sang short melodies from one to the next, starting from a randomly-generated melody (seed) given to the subject who acted as the first generation.

The dependent variable was the average number of unique pitches (specifically, pitch classes, "PCs") across generations. The data came from 5 different experiments differing only in parameters not relevant here. Relevant for the conclusions of the study is whether a decrease in this number took place across generations, and if so, for which experiment version.

Plotting the data shows a decrease in two of the expt. versions (3f and 3m):

However, it seems frugal to just claim this based on the plot, without any sort of statistical test. I don't refer here to effect size (whether there's been "enough" of a decrease), but to how likely it is for such an effect (of any size) to have arisen through chance alone.

I'm guessing bootstrapping would be the specific "control" against which to test for the significance of said decrease in the number of unique picthes. But it is unclear to me how the null/bootstrapped distribution should be obtained in this case, and how exactly I would then go ahead with the significance test (I've never used bootstrapping, just know of it).

In the plot above, I included the DV's value for the seed, as that is as good a control as any, for x=1. However, I guess the bootstrapping would have to extend across the same number of data points (8 generations across the x-axis, in this case), and not just cover me for the first time point.

I did these analyses in Matlab, if relevant. Any thoughts how this test could be implemented (or what extra details I need to give to make the question answerable) would be very much appreciated.

I guess that's what it would amount to, yes - but again, not sure what should get permuted, etc. — z8080, May 22 '22 at 20:51

Tim · Accepted Answer · 2022-05-24T19:27:45.623

2

This sounds more like a candidate for a permutation test.

Relevant for the conclusions of the study is whether a decrease in this number took place across generations.

So your hypothesis is that there would be a decrease in PCs with regard to generations. This means that your null hypothesis is that there is no such decrease, or that all the generations are roughly the same, or "random". In such a case, the permutation test would randomly permute the generation numbers to simulate a null distribution where PCs of different lengths can happen in any generation, regardless of order.

The next decision that you need to make is how do you operationalize "decrease". There are tests for trends, but you want to use a permutation test, so you need to define some statistic that measures the decrease (e.g. the difference between the first and last generation, the average difference between subsequent pairs, the slope of linear regression, etc). In the permutation test, you would calculate the statistic over the "null" data and then compare the result from experimental data to the distribution of the statistics estimated over the null re-samples. This would tell you how likely the statistic would be if the null hypothesis was true, the $p$-value.

I don't use Matlab, so let me use Julia to give an example, as the syntax is quite similar.

using GLM
using Random
using Statistics
generation = [0:8;]
Expt 3m data scrapped from your plot
unique_PCs = [7.6, 7.7, 7.1, 6.6, 6.7, 6.4, 6.6, 6.75, 6.2]
X = hcat(ones(length(generation)), generation)
y = unique_PCs
linear regression slope
slope = coef(lm(X, y))[2]
null_slopes = []
for i in 1:5000
    # randomly permute the rows of X
    X_null = shuffle(X)
    null_slope = coef(lm(X_null, y))[2]
    append!(null_slopes, null_slope)
end
calculate the probability that the observed slope
is lower or equal to the slopes from the null distribution
mean(null_slopes .<= slope)
0.0056

As you can see from the example above, the probability of observing as small regression slope as for Expt 3m in the "random" data under null distribution is low.

edited May 24 '22 at 19:27

answered May 22 '22 at 21:12

Tim

138,066

This all sounds about right! Do you happen to know how I can implement this permutation test in matlab? I found this user-created function but it's not clear to me how I would feed the experimental data into it as inputs. – z8080 May 23 '22 at 05:28
1

@z8080 I don't use Matlab, but it is fairly easy to implement: you basically need to permute the column of your data that holds the generation numbers. For R there is a book on this topic https://www.goodreads.com/book/show/15941460-introduction-to-statistics-through-resampling-methods-and-r – Tim May 23 '22 at 07:43
Fantastic, thank you – z8080 May 23 '22 at 08:03
I can't access the full text of the book now, but I wonder if I can follow-up quickly about the Matlab function (code pasted here for ease).
Its permutations test for a difference in means between two samples. I get that the first sample would be one of my generation=1:9 timeseries in the plot (say, for expt 1f). But how am I meant to define the second sample, since I don''t want to compare with another version, but with the "null"?

Also, that function doesn't allow computing a statistic, as you describe.
– z8080 May 23 '22 at 13:25
Also, going back to my original question, why is it that permutation testing is more appropriate than bootstrapping in this case? Read a bit about the difference between the two here, but it's not entirely clear to me. – z8080 May 24 '22 at 11:27
@z8080 permutation test is just straightforward to define here. – Tim May 24 '22 at 11:30
But how, given that I am asked to provide two samples (see comment above), whereas I just want to test one (rather: each) of the 5 timeseries in my plot? – z8080 May 24 '22 at 12:44
1

@z8080 I provided an example. – Tim May 24 '22 at 14:08
Thanks a lot! The only bit of the code I don't understand is the end: are you computing the probability by taking the mean of the item-by-item division between elements of null_slopes and the scalar slope?! That command seems odd, and is not allowed in Matlab (and no Julia-to-Matlab converters seem to exist). – z8080 May 25 '22 at 07:46
Also, did you not mean "the probability of observing as large a regression slope as for Expt 3m in the "random" data under null distribution is low"?! The null would be that the slope is low, i.e. there's no increase/decrease across generations, so what would qualify as non-null is seeing a slope that's larger than that by a sufficient margin. Am I misunderstanding this? – z8080 May 25 '22 at 07:47
One more, sorry: does running one of these tests for every expt version count as multiple comparisons? feels like simply permuting gen numbers, but keeping the y values the same, risks capitalising on chance.. – z8080 May 25 '22 at 07:47
1

@z8080 multiple comparisons is a completely separate question that cannot be answered in a comment. It depends. I don't understand the second part of your comment, permutation tests do not "capitalize on chance" more than any other kind of hypothesis test. Your null hypothesis is that order does not matter and this is how the test is constructed. – Tim May 25 '22 at 08:40
Got it, thank you! What about the mean(.<=) calculation, and the small vs large slope issue? – z8080 May 25 '22 at 09:05
Understood. And the mean(.<=) computation? Is it just a frequentist expression of probablity, as in "how many of these null slopes were smaller than the empirically-derived slope"? – z8080 May 25 '22 at 10:06
Also, to compute the linear regression slope I used polyfit(X, y, 1) in Matlab, but that expects X and y to be of equal size, not the case with the 2-column X matrix that you defined. If I just asign X=generation, then I get the same p-value at the end (barring the randomness) as you do, so I think I'm all set?! – z8080 May 25 '22 at 11:00
1

@z8080 two columns are intercept and slope, intercept is not interesting here, just for computational reasons. If your software includes the intercept by default that's fine. Also, the slope is just an example, you could use another statistic here as well. The mean of binary values is just an empirical estimate of probability from a sample. – Tim May 25 '22 at 11:12
Ah yes, that's clear now. And final question (sorry!): keeping slope as the metric, you said slope>1 means increase, <1 means decrease; but is 0 not the critical value, instead of 1? – z8080 May 25 '22 at 11:24
Smaller of course, but you seem to be describing a (sub vs supra-unitary) ratio, whereas I learned that increasing functions have slopes (a, in f(x)=ax+b) that are >0, and decreasing have slope < 0. Sorry if I'm being dim about it, just want to make sure I understand well. – z8080 May 25 '22 at 13:31
@z8080 yes, I'm deleting the comment, this is irrelevant, you care only about how does your null distribution refers to the actual parameter. – Tim May 25 '22 at 14:03
Finished my analysis now, thanks to your brilliant answer and follow-up help! Do you think doubling the alpha to 0.1 is warranted, as for a one-tailed test, considering the hypothesis is directional, e.g. sharper decrease, greater difference between first and last, etc.? – z8080 May 25 '22 at 14:18

Building a bootstrapped distribution to test reliability of empirical data

1 Answers1

Expt 3m data scrapped from your plot

linear regression slope

calculate the probability that the observed slope

is lower or equal to the slopes from the null distribution

0.0056