Lets say I have a dataframe that looks like this:
groups <- floor(runif(1000, min=1, max=5))
activity <- rep(c("A1", "A2", "A3", "A4"), times= 250)
endorsement <- floor(runif(1000, min=0, max=2))
value1 <- runif(1000, min=1, max=10)
area <- rep(c("A", "A", "A", "A", "B", "C", "C", "D", "D", "E"), times = 100)
df <- data.frame(groups, activity, endorsement, value1, area)
printed:
> head(df)
groups activity endorsement value1 area
1 1 A1 0 7.443375 A
2 1 A2 0 4.342376 A
3 1 A3 0 4.810690 A
4 4 A4 0 3.494974 A
5 3 A1 1 6.442354 B
6 1 A2 0 9.794138 C
I want to run a logistic regression (predicting endorsement from groups), but if you look at the area variable, A is very well represented, whereas B and E are not.
I'm not interested in the area variable itself, but the stats will be driven by areas that have high representation in the dataset, so I need to weight the data but I'm not sure the correct way to do it
This is the model I'd like to run:
library(lsmeans)
model <- glm(endorsement ~ factor(groups), data=df, family=binomial(logit))
anova(model, test = "Chisq")
lsmeans(model, pairwise ~ groups)
Without any adjustment, the "main effect" of groups and any pairwise differences will primarily be driven by any effects found in the most represented area (in the actual dataset area A has about 100x more subjects than any other area)
Whats the correct way to adjust for the unbalanced area representation? I thought about upsampling the minority groups (or even downsampling the majority group) but I feel like this would have adverse/artificial effects on the power of the test?
areaas a random effects variable help the model account for differences in area representation? – Simon Mar 27 '17 at 06:29areaas an additive term in the model you specified. Then it will estimate a separate intercept for each area, and thelsmeansstep will average the predictions for each area together, giving them equal weight. Thus the areas will be equally represented when it comes to summarizing via the group means and comparisons thereof. – Russ Lenth Mar 27 '17 at 20:50