Comment: As @Ralph Winters points out, the question "Is the probability of choosing B that same in all four groups?" can be answered by performing a chi-squared test of independence. This analysis can stand on its own and might be all the OP needs to complete his study. On the other hand, we might want to know how the probability of B differs across groups, eg. in what direction (less/more) and by how much. To answer such questions, we estimate the probabilities in each group and compare them.
You want to compare the levels of a categorical variable in terms of their effect on the response. It's easier to make such comparisons in terms contrasts than in terms of regression coefficients.
Specifically, you want to compare one of four groups, Group 4, to the other groups, Groups 1—3, in terms of the probability that a participant chooses "B" given the choices "A", "B" and "Neither".
You already know that, since the outcome is one of three pre-determined categories, an appropriate model is multinomial regression. So I focus on how to estimate the contrasts between Group 4 and the other three groups.
In fact, I formulate two comparisons:
[Q1] Does a participant from Group 4 choose B with a higher probability than participants in groups 1, 2 and 3?
[Q2] Does a participant from Group 4 choose B with a higher probability than the average probability for groups 1, 2 and 3?
The answer to these questions could be different if the probability of choosing B, $P_g(B)$, varies among groups $g=\{1,2,3\}$.
I use the emmeans package to answer questions Q1 and Q2 in terms of contrasts. For even more fun I create an uneven sample: Groups 1, 2, 3 and 4 account for 12.5%, 12.5%, 50% and 25% of the data, respectively. I specify the following probabilities for choosing "A", "B" and "Neither" in each group.
sizes <- n * c(.125, .125, 0.5, 0.25)
probs_Group1 <- c(0.8, 0.2, 0.1) # Probabilities of A, B, Neither
probs_Group2 <- c(0.8, 0.2, 0.1)
probs_Group3 <- c(0.4, 0.5, 0.1)
probs_Group4 <- c(0.3, 0.6, 0.1)
The code to generate data is attached below
data
#> A tibble: 10,000 × 3
#> Group Choice Group4
#> <chr> <chr> <chr>
#> 1 1 A Other
#> 2 1 A Other
#> 3 1 A Other
#> 4 1 A Other
#> 5 1 B Other
#> 6 1 A Other
#> 7 1 A Other
#> 8 1 A Other
#> 9 1 A Other
#> 10 1 A Other
#> … with 9,990 more rows
Now let's answer question Q1. To do this we compare Group 4 to the other groups by modeling choice as a function of the indicator variable Group4 defined as "If participant is in Group 4, then 4 else Other". I name the levels 4 and Other rather than 1 and 0 so that the results are easier to interpret.
# Fit multinomial model for Choice by Factor
model1 <- multinom(Choice ~ Group4, data = data)
Make pairwise comparisons between factor levels pairs()
for each each choice by = "Choice".
Do one-sided "greater than" hypothesis test.
pairs(
emmeans(model1, ~ Choice | Group4, mode = "prob"),
by = "Choice", side = ">"
)
See the emmeans documentation to learn about the formula syntax.
#> Choice = B:
#> contrast estimate SE df t.ratio p.value
#> 4 - Other 0.20719 0.01128 4 18.372 <.0001
#>
#> P values are right-tailed
A "contrast" is a statistical term for a comparison. In this case the probability to choose "B" is estimated to be .21 higher in Group4 than in the Other groups.
Since we simulated the data we know the true probabilities so we can check if the estimate agrees with the true difference. Other consists of 16.7% Group 1 with $P_1(B)$ = 0.2, 16.7% Group 2 with $P_2(B)$ = 0.2 and 66.7% Group 3 with $P_3(B)$ = 0.5. Therefore, the probability of choosing B in Other is the weighted probability 0.167 × 0.2 + 0.167 × 0.2 + 0.677 × 0.5 = 0.4. So the true difference between Group 4 and Other is 0.6 - 0.4 = 0.2 while the estimated difference is 0.21.
However, there are four times more participants from Group 3 than from Groups 1 and 2 each. So the probability of choosing "B" in the Other group is biased towards $P_3(B)$ which in the simulation is higher than both $P_1(B)$ and $P_2(B)$.
Question Q2 asks to compare the probability of B in Group 4 to the average probability for groups 1, 2 and 3. In this case the three groups contribute equally to the average even though they are sampled unevenly. So the probability of choosing "B" is the unweighted average (0.2 + 0.2 + 0.5) / 3 = 0.3. And the difference with Group 4 is 0.6 - 0.3 = 0.3.
Now let's answer question Q2. To do this we nest nest groups 1, 2 and 3 into a grouping factor Other.
# Fit multinomial model for Choice by Factor
model2 <- multinom(Choice ~ Group, data = data)
Group factor levels into higher-level nested categories.
grid <- ref_grid(model2)
grid_grouped <- add_grouping(
grid, "Nested", "Group",
c("Other", "Other", "Other", "4")
)
Make pairwise comparisons between the nested categories.
pairs(
emmeans(grid_grouped, ~ Choice | Nested),
by = "Choice", side = ">"
)
See the emmeans documentation to learn about grouping factors.
#> Choice = B:
#> contrast estimate SE df t.ratio p.value
#> 4 - Other 0.3146 0.01131 8 27.821 <.0001
#>
#> Results are averaged over the levels of: Group
#> P values are right-tailed
The estimate of the contrast 0.31 is close to the true difference 0.3, which hopefully is some evidence that the intended comparison is estimated correctly.
R code to simulate a multinomial dataset for a study of four groups of participants and three choices.
library("nnet")
library("emmeans")
library("tidyverse")
set.seed(1234)
choices <- c("A", "B", "No choice")
Helper function to draw n samples from the multinomial distribution
over the three choices with probabilities given by the vector prob
randomize_choice <- function(n, prob) {
r <- rmultinom(n, 1, prob)
choices[seq_along(choices) %*% r]
}
number of participants
n <- 10000
sizes <- n * c(.125, .125, 0.5, 0.25)
probs_Group1 <- c(0.8, 0.2, 0.1)
probs_Group2 <- c(0.8, 0.2, 0.1)
probs_Group3 <- c(0.4, 0.5, 0.1)
probs_Group4 <- c(0.3, 0.6, 0.1)
data <-
tibble(
Group = rep(c("1", "2", "3", "4"), times = sizes),
Choice = case_when(
Group == "1" ~ randomize_choice(n, probs_Group1),
Group == "2" ~ randomize_choice(n, probs_Group2),
Group == "3" ~ randomize_choice(n, probs_Group3),
Group == "4" ~ randomize_choice(n, probs_Group4),
TRUE ~ NA_character_
),
Group4 = if_else(Group == "4", "4", "Other")
)