I have a categorical factor with 100 levels and 100 different proportions. I would like to test (a) whether these proportions differ from 50%, and (b) if any of the levels in particular differ more from 50% than others.
I was thinking I could use a binomial generalized linear model to predict proportions. The intercept in the regression model would tell me whether any of the levels differ from 0, right?
props <- runif(148, 0, 1)
x <- 1:148
df <- data.frame(x, props)
m <- glm(props ~ 1, data=df,
family=binomial)
summary(m)
This intercept above would indicate whether the average proportion differs from 0.
But how would I be able to tell whether any of the proportions in particular differ more or less from 50% than others? I was considering an effects coded categorical factor but that seems potentially odd since the categorical indicator/level is different for every single row. It also provides a ton of different coefficients.
df$group <- as.factor(df$group)
contrasts(df$group) <- contr.sum(148)
m <- glm(props ~ group, data=df,
family=binomial)
summary(m)
For each group, I only have one observation. This may be odd but this was actually originally multilevel data with one instance of each group within each cluster. I wanted to focus on the groups and if the average count (i.e., proportion) per group differ from 50% so I collapsed the counts/binaries across clusters, giving me these proportions.
Is there a better way to model this? I was considering a X2 goodness of fit test but the expected proportions there needs to sum to 1 which would not be the case if the expected proportions are all = .50.
https://stats.oarc.ucla.edu/stata/faq/how-does-one-do-regression-when-the-dependent-variable-is-a-proportion/
https://stats.stackexchange.com/questions/89734/glm-for-proportion-data-in-r
– JElder May 05 '22 at 18:58