1

Somebody asked me for help to design an experiment with animals and its statistical analysis for a medical thesis.
She wants to "calculate" de sample size, and wants to minimize it. She doesn't have any previous data yet.

The model is going to be a linear regression model with four covariates and repeated measures.
The participants will be randomly selected at the beginning. No further manipulation.

I know that calculating N depends on many subjective things.
I would like to suggest her to try with a small N instead, and increase it as needed, with corrections for the p-values.
I don't what corrections because some of the data is going to be new but we already know the old one.
Something like Bonferroni doesn't sound good because the denominator will increase as fast as the number of analysis.

What would be the simplest way to do it? (without bayesian analysis because I don't have experience with it).

Maybe keep doing new single experiments until she gets the desired effect?
Or maybe duplicate N at every step?

skan
  • 1,064

1 Answers1

2

As you recognize, the multiple testing with an adaptive design poses a risk of false-positive results (Type I error).

You certainly do not want to continue adding cases and testing the hypothesis repeatedly until you reach an apparently "significant" result. If you do not correct for multiple comparisons, then this is an extreme form of p-hacking. If you do correct for multiple comparisons, then in a Bonferroni-type correction "the denominator will increase as fast as the number of analysis," as you say, and you will lose power to detect a true treatment effect.

There are some general strategies to adapt the sample size or otherwise reduce the number of individuals in a way that is statistically acceptable.

First, you can use the data in a way that doesn't involve hypothesis tests at an interim analysis. The estimated variance among observations is often the most difficult thing to choose in power analysis. An early-stage evaluation of variance to refine the sample size, without a test of the treatment effect, poses little risk of inflating Type I error. See this FDA guidance (a useful overview of broader issues in adaptive designs).

Problems arise when you perform hypothesis tests at one or more interim stages of the study: for example, you do an interim analysis to estimate both the treatment effect and the variance to adjust the sample size. If that's done, then you have at least two general strategies to try to minimize the sample size, although they need to be chosen during study design to avoid inflating Type I error.

One is to choose a p-value for the study's Type I error and design the study so that you "spend" a certain amount of that p value at each interim stage. If you pass a corresponding criterion at an early stage, you can stop the trial at that point and test fewer individuals than anticipated. This Penn State web page outlines advantages and disadvantages of three ways to do that.

If you adaptively change the sample size based on results at an interim stage, you can perform the hypothesis test on each stage of the study separately, then do a test on their combined p-values. If the null hypothesis holds, then the p-values of all the stages are distributed uniformly and independently over [0,1], whether or not subsequent stages were re-designed based on the results of earlier stages. That allows a combined test on the p-values of the single-stage hypothesis tests. This paper discusses that approach. A z-test on a weighted sum of the z-scores corresponding to the p-values is one choice, with the weights chosen so that the sum of their squares equals 1. For example, with two stages you could choose weights of $1/\sqrt 2$. To avoid inflating Type I error, you need to choose those weights at the beginning of the study.

Although you can control for false-positives with such designs, they can bias the estimates of the treatment effect. For example, if the interim analysis is "lucky" in finding a much stronger effect than the true value, the revised sample sizes will be smaller and the final estimate can put undue weight on the (now over-represented) "lucky" early cases.

The additional problem in your situation is that, at the sample sizes that are usually feasible for a medical thesis, these strategies might not help much. To be helpful, these strategies can need dozens to hundreds of cases available to the study, even if not all cases are ultimately tested. The FDA guidance indicates that adaptive sample sizes can decrease case numbers by about 15%. Try simulations, based on reasonable estimates of the study results, to see what you might expect to gain from them.

EdM
  • 92,183
  • 10
  • 92
  • 267