I have a dataset with two categorical predictors and a continuous response. Suppose the predictors have $a$ and $b$ levels, respectively, with group-specific parameters $\alpha_1, \dots, \alpha_a$ and $\beta_1, \dots, \beta_b$. I can implement a sum-to-zero constraint in a two-way ANOVA model such that
$$\sum_{i=1}^a \alpha_i = 0 \text{ and } \sum_{i=1}^b \beta_i = 0$$
but I want to impose a constraint that will cause the weighted average of the parameters to equal zero where the weights are numbers of observations. That is, if the first predictor has $m_1, \dots, m_a$ observations for its $a$ levels and the second predictor has $n_1, \dots, n_b$ observations for its $b$ levels, then I want
$$\sum_{i=1}^a m_i \alpha_i = 0 \text{ and } \sum_{i=1}^b n_i \beta_i = 0.$$
Is it possible to impose this constraint and minimize the sum of squared errors? If so, how would it be implemented?
EDIT: A commenter asked why I want this constraint. I have a dataset composed of observations from a game in which there are two distinct roles (an A role and a B role). For each event in the game, the role A player and the role B player play against each other and generate a result. The dataset has two predictors (roleA and roleB, which are categorical variables containing the names of the role A and role B players) and the response is points, a continuous variable. The code below generates an example dataset. Note that the players do not play an equal number of times.
N <- 10000
set.seed(1)
create random matchups of role A and role B players
roleA <- c(rep('A1', N * .4), rep('A2', N * .3), rep('A3', N * .15),
rep('A4', N * .1), rep('A5', N * .05))
roleA <- as.factor(sample(roleA, N, replace = F))
roleB <- c(rep('B1', N * .4), rep('B2', N * .3), rep('B3', N * .2),
rep('B4', N * .1))
roleB <- as.factor(sample(roleB, N, replace = F))
these vectors contain average points created per event by each player
weighted average is zero for both roles
A_means <- c(0.5, 0.1, 0, -1, -2.6)
B_means <- c(0.6, 0.3, -0.5, -2.3)
points for each event is sum of points created by each player
get_points <- function(p1, p2){
r1 <- rnorm(1, mean = A_means[which(roleA_players == p1)], sd = 1)
r2 <- rnorm(1, mean = B_means[which(roleB_players == p2)], sd = 1)
r1 + r2
}
points <- mapply(get_points, p1 = roleA, p2 = roleB)
head(data.frame(roleA, roleB, points))
roleA roleB points
1 A1 B1 -0.7608573
2 A1 B2 -1.4209561
3 A2 B3 -1.4254282
4 A4 B1 -0.2304648
5 A1 B1 2.1109341
6 A4 B1 0.8142395
I want to find the effect that each player has on points. For my dataset we can assume that the weighted average effects of the role A players and the role B players are both zero. However, because the players don't play an equal number of times, the parameters from a model with the conventional STZ constraint won't have a weighted average of zero. Thus, I am trying to find a way to implement the constraint described above.