3

I am trying to understand what statistical test is best in circumstances where you have both a before/after (so paired data) and a control group

There are t-tests available for one of each scenario (two independent samples t-test and paired t test) but I cannot find any information on what to do when you have a mix of the two.

Any help is appreciated.

  • In what way is the "before" population statistically different from the "control"? In that time has passed? – kqr Jul 07 '22 at 18:26
  • https://stats.stackexchange.com/questions/3466/best-practice-when-analysing-pre-post-treatment-control-designs – kjetil b halvorsen Jul 08 '22 at 12:21
  • You describe your before/after data as paired. Did you record the data in such a way that you can identify the before score and the after score for a given individual, that is, e.g. for Participant A ? – Sal Mangiafico Jul 08 '22 at 16:58

2 Answers2

2

Assuming you have before/after data for the control group as well, you are describing a BACI (Before-After-Control-Impact) design.

In this case, the best approach is probably to fit a linear model (regression/ANCOVA) with main effects for time (before/after) and treatment status (control/treatment), and an interaction term between the two. The interaction term will be very important in evaluating the results, and I strongly recommend plotting the model output to understand the results well (interpreting interaction terms from the coefficients alone is a common source of confusion, and it is needless).

In R, the code for the model would look like this:

lm(outcome ~ time * treatment_status, data = dat)

Other assumptions I have made:

  1. You have a single measurement before and after (if you have multiple measurements before and after, you might simply need to add random effect to address this).
  2. You have multiple individuals (or sites, or whatever the experimental unit is) in both control and treatment groups.
mkt
  • 18,245
  • 11
  • 73
  • 172
0

The Bootstrap

If you can construct a well-defined function that expresses the observed effect magnitude (taking into account all four sets of data) in clinically important terms (which you really ought to be able to do – otherwise, what is it you are measuring?), you can always resort to the bootstrap:

  1. Draw four new sets of data with resampling from the ones you've observed.

  2. Compute the effect size of these simulated data sets.

  3. Repeat the steps above many times.

  4. What you get out of it is an approximation to the sampling distribution of interest. In other words, from this distribution you can compute means, standard errors, confidence intervals or whatever you need.

Example with made-up data:

import math
import random

ct_before = [4, 5, 2, 5, 3, 4, 8] ct_after = [5, 6, 1, 5, 3, 5, 7] tx_before = [5, 3, 5, 3, 4, 4] tx_after = [5, 8, 7, 6, 4, 6]

observations = (ct_before, ct_after, tx_before, tx_after)

I don't know what is a clinically meaningful effect magnitude

in your case, so I'm just taking the difference of the

distances between the sums. You can come up with whatever you

want here and the algorithm will work fine.

def effect_magnitude(ct_b, ct_a, tx_b, tx_a): ct_d = sum(ct_a) - sum(ct_b) tx_d = sum(tx_a) - sum(tx_b) return tx_d - ct_d

print(f'Observed effect magnitude: {effect_magnitude(*observations)}')

def bootstrap_replication(ct_b, ct_a, tx_b, tx_a): return ( random.choices(ct_b, k=len(ct_b)), random.choices(ct_a, k=len(ct_a)), random.choices(tx_b, k=len(tx_b)), random.choices(tx_a, k=len(tx_a)) )

def bootstrap_distribution(observations, B=5000): for i in range(1, B): yield effect_magnitude(bootstrap_replication(*observations))

distribution = sorted(list(bootstrap_distribution(*observations)))

B = len(distribution) distr_sum = sum(distribution) distr_sq_sum = sum(v2 for v in distribution) mu = distr_sum/B se = math.sqrt((distr_sq_sum - distr_sum2/B)/(B-1)) p05 = distribution[math.floor(B*0.05)] p95 = distribution[math.ceil(B*0.95)]

print(f'Bootstrap mean effect magnitude: {mu:.2f}') print(f'Bootstrap standard error: {se:.2f}') print(f'Gaussian 90 % confidence interval based on bootstrap se: [{mu-1.645se:.2f}, {mu+1.645se:.2f}]') print(f'Naïve bootstrap 90 % confidence interval: [{p05:.2f}, {p95:.2f}]')

This outputs

Observed effect magnitude: 11
Bootstrap mean effect magnitude: 10.92
Bootstrap standard error: 7.57
Gaussian 90 % confidence interval based on bootstrap se: [-1.53, 23.36]
Naïve bootstrap 90 % confidence interval: [-1.00, 23.00]

Theoretical approach

If you for whatever reason want a more theoretical approach, you still need to start from a function that expresses the observed effect magnitude. Assuming we like the one I used in the previous example, we see that it's effectively the sum of four sums.

For each group, we get the sample variance:

$$s_{c,b}^2 = 3.62\;\;\;\;\;\;\;\;s_{c,a}^2 = 3.95$$

$$s_{t,b}^2 = 0.80\;\;\;\;\;\;\;\;s_{t,a}^2 = 2$$

The variance of the sum of each one of them is $s^2\sqrt{n}$. Then the variance of the sum of the sums is those variances summed. So the variance of the effect size in this example would be

$$s_e^2 = \sqrt{7} (3.62 + 3.95 + 0.8 + 2) = 27.44$$

The standard error of the effect size is, then, $\sqrt{27.44} = 5.24$.

This gives you an observed effect magnitude of $11$, with a theoretical standard error of $5.24$, giving you a gaussian 90 % confidence interval of $\left[2.38, 19.62\right]$.

This interval is narrower than the naïve bootstrap one! Which is not surprising, because I deliberately picked data that would be just about insignificant or significant depending on which test you choose. Overall, the rough size of the intervals are the same, which is a good sign. I would report both if I tried both tests.

kqr
  • 814