1

I'm analyzing data from a study comparing 5 treatment arms. Women receiving ovarian stimulation for in vitro fertilization at different phases of the ovarian cycle. The treatment efficacy endpoint was the number of oocytes retrieved.

The hypothesis of the study is that stimulation can be started at any phase of the study, so that all treatments (whatever the phase) are equal? Can we say that this is an equivalence study?

Those who designed the study did not specify an equivalence margin a priori. Starting from an a posteriori equivalence margin of 0.5 oocytes, for example, we still won't be able to draw any conclusions from this study. Still maintaining the equivalence margin of 0.5. Are there any statistical tools that would enable us to reach the conclusion of rejecting the alternative hypothesis or accepting it without repeating the study? Are bootstrap or Bayesian methods an option?

Here are some results illustrating the problem

with the non-normal distribution of observations, the statistical test is based on the comparison of medians. I haven't made the confidence intervals for the difference between the medians, but I have added those for the difference in means between each pair of treatments. Even if I calculate the confidence intervals for the difference between the medians, I don't think we'll be able to draw any conclusions.

  • 1
    For a start, look at this page. I'm not sure, however, how to generalize from the 2-treatment comparisons discussed on that page to simultaneous equivalence testing on 5 treatments. Were any of the women treated more than once? If so, you need to take those intra-individual correlations into account. Also, as these are count data, you might want to consider a count-based model (e.g., negative binomial). – EdM Jan 05 '24 at 22:20
  • @EdM each individual receive only one treatment and once – Seydou GORO Jan 06 '24 at 00:44
  • Is it the same treatment received at 5 different points of the ovarian cycle, or five difference treatments received at the same point of the ovarian cycle? – dipetkov Jan 06 '24 at 10:23
  • 1
    It could be looked at as a non-inferiority study as well, since the hypothesis could be that starting Tx any time is not worse than starting other times. Non-inferiority usual occurs when clinicians try to push the idea that the effects are not significantly worse, and using their novel treatment modality saves costs. Recall, "equivalence" is the null hypothesis (there's no difference) vs. the alternative hypothesis (something's there) and it's either better (superiority trial) or not worse (non-inferiority trial). – wjktrs Jan 06 '24 at 20:16

1 Answers1

1

Converting comments to answer:

I'm not sure based on your description of the problem that "equivalence" was ever a prespecified hypothesis. You performed 5 experimental methods and assessed oocyte production, you then fit an ANOVA. This seems just and consistent. I also notice the $p$-value is not significant. How was it not the case that your null was equivalence, and that you were interested if any method produced significantly more (or less) oocytes?

Many desperate analysts try to salvage "null" experiments by reversing their hypothesis like this and claim the reverse so they can report a "significant" result. This never makes sense. In fact, prespecified equivalence studies require very large sample sizes to demonstrate equivalence within meaningful margins.

Post-hoc equivalence in a study with already small sample size is a maliferous kind of p-hacking. You set an alpha so high, and a margin so large, that when you consider the totality of it, the study goes from a reasonable non-significant result to a joke.

AdamO
  • 62,637