I would be grateful for some advice regarding the optimal approach to hypothesis testing in the following scenario:
We have data from a cell culture experiment obtained as follows: We culture some cells and then disperse equal amounts in 12 wells on a "plate". We apply an "injury" to all the wells; 6 wells receive a treament with the aim of attentuating the injury; 6 wells receive no treatment and serve as our control. We repeat this experiment 2 more times. We expect the data from each plate to be correlated, but the responses of the treatment and control groups on a plate are considered as independent. Our outcome variable is normally distributed. Our null hypothesis is that there is no difference in outcome between treated and control groups.
Now, let's say we are compelled to obtain the most accurate p value we can from these data. I recognize that p values are plagued with problems, but for the sake of argument, there is no get out of jail card and we are forced to provide a p value. A common strategy in the literature in my field is to take a summary measure (the mean) from each group and perform a t-test. This gives 3 treatment and 3 control independent data points. This seems statistically acceptable but obviously discards a large amount of data.
An alternative might be to model this as a linear mixed model:
We model the variance between experiments: u ~ N(0,sd1)
We model the variance in each well: e ~ N(0, sd2)
and the outcome Yij = B0 + B1*X + Ui + Eij , where (X = 0, 1); i = (1, 2, 3); j = (1, ... 12). We have a balanced design and no missing data with 36 data points.
Now, we have the thorny issue of providing a p value from the the linear mixed model. I know this is not a trivial problem and this is really the guts of this question.
I have simulated this with a variety of approaches. We have a balanced design and no missing data, so arguably, we could use the nlme package in R and the anova function to obtain a p value. This gives exactly the same result as if we modelled this as a repeated measures anova. So with a linear mixed model, we have 32 degrees of freedom vs our 4 degrees from the summary measure analysis. If we use the Lmertest package with Kenward-Roger approximation, we still have 32 degrees of freedom. A GEE approach to parameter estimation produces similar p values to the linear mixed model.
This has the feel of having our cake and eating it too: Compared to general linear regression, We receive the advantage of a deflated standard error on our B1 treatment effect (because we have a positive correlation and are making a within-group comparison), but we suffer no penalty on the degrees of freedom. Surely, this cant be right. Can it?
Modelling this as a survey design type model with 3 clusters seems to provide a compromise between the summary measure and the linear mixed model results and is more reassuring.
What about applying a design effect penalty to the sample size: N/(1 + (n - 1)$\rho$), where $\rho$ is the intraclass correlation and using the lmm F statistic on the revised degrees of freedom?
I would be most grateful for some advice regarding the best approach here. The general issue, which I know has been discussed before, is how to obtain p values from correlated data from relatively small data sets.