Help finding the right test

Question

I am running an experiment and now it is time to do some analysis, but I am having a hard time figuring out the right way to analyze my data. I have a number of questions, but first let me give you some context.

This is a behavioral experiment, where the setting is as follows:

i) There are 2 conditions, let's call them A and B.

ii) Each participant is randomly assigned to either A or B.

iii) Regardless of the condition, each participant goes through 8 trials.

iv) At each trial, we show a stimuli to the participant, and then record whether they exhibit a certain behavior. So we have a binary variable, call it Target_Behavior, which takes the value 1 is the participant exhibits this behavior, and 0 otherwise.

The first thing I want to study is whether there is a significant difference between the two conditions, regarding the frequency participants exhibit the target behavior. As you can see, for each participant we have 8 measurements, so it is a repeated measures setting, but I don't know how to choose the right test that takes this into account.

Furthermore, I don't know whether my data allows for any parametric assumptions, so I am only using non-parametric tests, which is detrimental to the power of my tests.

So far, I have been using a very simple (and probably wrong) solution. I am looking at each trial on its own. This way, for any trial we have only a single measurement for each participant, so, for example considering trial 1, I make a list containing the values of Target_Behavior for condition A, and the same thing for condition B. Finally, I do a chi-square test, comparing the two lists.

I am pretty sure this is suboptimal though, while it also poses an extra challenge. I am convinced there is a significant difference between conditions A and B, but the test I just described only gives significant results if there is a difference of about 30% in frequency between the two conditions. If they are at 20% it turns out to be not significant, however, this doesn't seem right to me, 20% is a huge discrepancy!

I have about 20 participants in each condition, if this is of any help.

I would really appreciate your help in figuring the right way to deal with this situation. I have been looking online for weeks, and still I haven't found a solution. I have some additional questions, but for now I think this is the most important one.

Thank you in advance!

It is my first PhD experiment, so I am getting a bit anxious over finding the right test. I guess that simply written, the question I would like you to help me with, boils down to: how can I test whether there is a difference between 2 conditions, if for each participant I have 8 measurements (i.e. trials), where each measurement is of the form "Success" (encoded as 1) or "Fail" (encoded as 0)? — John_P, Apr 15 '22 at 14:00
For future reference, a far better time to be considering what analysis to do is when you are planning the experiment. Since this is early in your research, maybe you shouldn't be testing at all, but exploring the data and mining them for testable, interesting hypotheses. That's a fully legitimate exercise. — whuber, Apr 20 '22 at 22:47
I totally agree with that, but in my domain it is time consuming to find suitable participants, so my supervisor insisted to first formulate our hypothesis and then perform the experiment to either confirm it or not. Perhaps if my next experiments involve easier to find groups, we are going to perform a kind of exploratory analysis. Currently I am trying to get the resources needed to be able to know what analysis to use under each possible setting. If you have any references or books on the topic, I would be happy to know. — John_P, Apr 21 '22 at 17:39

dipetkov · Answer 1 · 2022-04-21T21:26:52.650

Caveat: You don't state it explicitly but you imply that, for each participant, their trials are independent with the same probability of success. In this case, the number of successes is a binomial random variable with size = 8 and unknown probability p.

Update: In the comments you explain that the independence assumption holds but the success probability may change as participants become familiar with the experiment. In this case, the trials are binomial random variables with size = 1 and unknown probability p_t. You can still use binomial regression after updating the model to include a fixed effect t for the sequence of trials.

We can compare conditions A and B with a binomial regression. We assume that:

Each participant goes through a fixed number of trials (not necessarily the same number of trials).
The success probability of each participant is a function of their group assignment + random noise.

Under this model, conditions A and B have effects α and β, and the average success probabilities are a function of this effect. [Mathematically, effect = logit(probability).]

Finally, we test if there is a difference between conditions A and B with a t-test on the difference in effects, β - α.

The following code illustrates how to do this analysis in R.

library("broomExtra")
library("lme4")
set.seed(1234)
n <- 20
trials <- 8
alpha <- 0.5
beta <- 2
sigma <- 0.1 # std. deviation of participant random effect
id <- seq(1, 2 * n)
condition <- rep(c("A", "B"), each = n)
On average participants in group A have effect alpha
and participants in group B have effect beta
effect <- ifelse(condition == "A", alpha, beta)
Add random noise to each participant's effect.
effect <- effect + rnorm(2 * n, sd = sigma)
aggregate(effect, list(condition), mean)
#>   Group.1         x
#> 1       A 0.4749336
#> 2       B 1.9422930
The success probability is the inverse logit of the participant's effect.
prob <- plogis(effect)
Finally generate the outcome: number of success in a fixed number of trials.
successes <- rbinom(2 * n, size = trials, prob = prob)

Let's start with a simpler model that assumes participants in each group have exactly the same probability of success. This is not true in the simulation and it's also not a reasonable assumption in general.

model1 <- glm(
  cbind(successes, trials - successes) ~ condition,
  family = binomial(link = "logit")
)
tidy(model1, conf.int = TRUE, conf.level = 0.95)
#> # A tibble: 2 × 7
#>   term        estimate std.error statistic     p.value conf.low conf.high
#>   <chr>          <dbl>     <dbl>     <dbl>       <dbl>    <dbl>     <dbl>
#> 1 (Intercept)    0.674     0.167      4.03 0.0000548      0.352      1.01
#> 2 conditionB     1.67      0.326      5.12 0.000000299    1.06       2.34

It's more appropriate — and in the simulation definitely correct — that participants effects are not fixed but random, with mean equal to the condition effect.

model2 <- glmer(
  cbind(successes, trials - successes) ~ condition + (1 | id),
  family = binomial(link = "logit")
)
tidy(model2, conf.int = TRUE, conf.level = 0.95)
#> # A tibble: 3 × 9
#>   effect   group term   estimate std.error statistic  p.value conf.low conf.high
#>   <chr>    <chr> <chr>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 fixed    <NA>  (Inte…    0.704     0.200      3.51  4.42e-4    0.311      1.10
#> 2 fixed    <NA>  condi…    1.72      0.367      4.69  2.67e-6    1.00       2.44
#> 3 ran_pars id    sd__(…    0.440    NA         NA    NA         NA         NA

In the summary table, conditionB estimates the effect difference β - α between conditions A and B. Since this is a simulation, we know that the true difference is β - α = 2 - 0.5 = 1.5. Its estimate from the data is conditionB = 1.72 with a 95% confidence interval [1.00, 2.44]. The t-statistic for the null hypothesis that there is no difference between the conditions is 4.69 and the corresponding p-value is 2.7e-6. So there is strong evidence to reject the null, which of course doesn't hold in the simulation.

The effects α and β are logits (log odds) of the success probabilities; the difference β - α is the log odds ratio. It's easy to convert effects to probabilities with the inverse logit transformation.

# Pr{success in group A}, Pr{success in group B}, Pr{success in B} - Pr{success in A}
c(plogis(alpha), plogis(beta), plogis(beta) - plogis(alpha))
#> [1] 0.6224593 0.8807971 0.2583377
# estimate of Pr{success in B} - Pr{success in A}
plogis(1.72 + 0.704) - plogis(0.704)
#> [1] 0.2495652

Thank you very much for your help. You are right, I didn't say anything about the independence of the trials, but the thing is that it is a bit complicated. Although there is no direct dependence between trials, as participants get familiar with the stimuli, they are expected to exhibit the targeted behavior more. Trials themselves are independent, but it is a latent variable of "familiarity" that could (or maybe should) influence their behavior. I hope I explain it clearly enough, does it make any sense to you? Do you maybe have a suggestion for this case or at least some references to study? — John_P, Apr 21 '22 at 17:34
Binomial regression can incorporate a possible "learning with time" effect. For example consider the model Yij = condition + time + (1 | id) where Yij be the jth trial of the ith participant. This model has fixed condition effect, fixed time effect and a random participant effect. You can also let time interact with condition. — dipetkov, Apr 21 '22 at 18:10
The model I proposed first is equivalent to Yij = condition + (1|id). So you can see how you can add complexity within the same framework of binomial regression. Even add a non-linear time effect using splines. However, I'd suggest you start simple and add complexity step by step only if the complex model fits the experimental data better. — dipetkov, Apr 21 '22 at 18:11
You cannot imagine how helpful your answers have been! I have a couple of somehow silly question though; i) the variable "time" you include in the model is the number of trial, right? ii) How should I interpret the final coefficient? For example, in your initial example, what the value of 1.72 means? The question I really want to answer is if there is any significant difference across conditions. After that I would also like to study the interaction of condition and trial, as you suggested. In general, do you have any references about experimental designs and appropriate statistical analysis? — John_P, Apr 21 '22 at 18:40
Perhaps time wasn't the best choice of term. Yes, I mean order / number of trial. Chronological time could make sense in a situation where we expect participants practice between trials to improve their change of success. But your description of familiarity suggests order of trials makes more sense. And I've updated the answer to explain the relevant statistic. — dipetkov, Apr 21 '22 at 21:22
About experimental design, I found this CV thread. But instead I'd suggest to ask the advice of your PhD advisor or even better the professors & other grad students in your program. This way you'll find recommended textbooks in your area of study. [Aside note: Commenters don't get notified of your replies unless you tug them with @username.] — dipetkov, Apr 21 '22 at 21:22
Thank you very much for everything :) So if I got it right, I should format my file as: User id, Number of Trial, Condition, Success, and then fit a model that predicts Success, using fixed Number of Trial and Condition effects, as well as a random User id effect. This would also mean that for each participant I would have repeated values in User id and Condition columns. Should I specify anything as a factor though? I am not really sure, but if I am to study differences among trials, shouldn't I convert them to factors? Is it fine to use User id as an integer? — John_P, Apr 22 '22 at 10:31
Unfortunately, my advisor is not really an expert on statistical analysis, while my peers are also mostly familiar with standard tests like chi-square. Getting in touch with a professor from a more technical department is an option, I am just a bit hesitant, because I feel like my questions are going to be boring and make me seem clueless, which I am, but so far I have been trying to deal with this fact on my own by looking for online resources. — John_P, Apr 22 '22 at 10:36
I suggest the following coding: Success is 0 or 1, ID is a unique integer or a factor, Condition is "A" or "B", number of trial is an integer. I wouldn't make #trial a factor because then the sequential ordering is lost. So trial #1 is (potentially) as different from trial #2 as it is from trial #8. Your hypothesis is that participants get better with each trial. Without having seen any data I'd be interested in three hypothesis about the #trial effect: no #trial effect, linear effect, smooth nonlinear effect. In this order. — dipetkov, Apr 22 '22 at 11:33
Start simple, esp. since you are also learning about stats. Work on the #trial effect vs linear effect comparison. You can ask for more feedback on CV (in a new question); there are model checks that you can/should do to confirm that a particular model makes sense for your data. The stakes are high because you spent time & effort on your experiment, so you shouldn't worry about asking for help so that you can draw conclusions from the data. — dipetkov, Apr 22 '22 at 11:33
On the topic of (text)books about statistics, here is a very opinionated & biased list: The Art of Statistics, R tutorials, Mixed Models. A mixed model has both fixed effects (eg. condition) and random effects (eg. participant). — dipetkov, Apr 22 '22 at 11:33
Thank you for the suggestions and all your help. I will probably start a new thread as soon as I advance my analysis a bit. — John_P, Apr 23 '22 at 09:24

Help finding the right test

1 Answers1

On average participants in group A have effect alpha

and participants in group B have effect beta

Add random noise to each participant's effect.

The success probability is the inverse logit of the participant's effect.

Finally generate the outcome: number of success in a fixed number of trials.