Consider a situation where there can be membership in group $A$, group $B$, both groups, or neither group. If we wanted to predict group membership probabilities from some covariate information, this would be a multilabel problem.
However, an alternative modeling strategy (I fear a rather poor one) might be to say that there are three categories—$A$, $B$, and nothing—and proceed with the predictions assuming three categories. Plenty of machine learning work has been done for similar problems with categorical outcomes (e.g., MNIST digit classification), and among the most basic ways to solve such a problem is through multinomial logistic regression. (Regardless of the particular modeling strategy (e.g., a neural network approach), it seems like the conditional distribution is a multinomial on one roll of the die.)
If the $A$ and $B$ categories are mutually exclusive, I see no problem. There are just three categories, much as there are ten categories in the MNIST digits. However, $A$ and $B$ are not mutually exclusive. There are times when both occur, in addition to only one of the two occurring or neither occurring.
That concerns me immensely when it comes to fitting a multinomial logistic regression. If we code the categories with the standard $0$ and $1$ indicators, we wind up with something like:
$$ A: (0,1,0)\\ B: (0,0,1)\\ \text{Neither}: (1,0,0)\\ \text{Both}: (0,1,1) $$
By having a vector with two $1$s in it, we seem to be telling the model that these are not conditional multinomial distributions on just one trial (one roll of the die), as we need at least two trials to get a vector like $(0,1,1)$, rather than the one trial in a multinomial logistic regression for a multi-class problem like MNIST digit classification. If we have two trials, then vectors like $(1,0,1)$ and $(2,0,0)$ should be possible, and we know them not to be.
Thus, it seems like a mistake to shoehorn a multilabel problem into this kind of multinomial logistic regression.
However, does multinomial logistic regression exhibit robustness to this kind of apparent mistake? (Is it even a mistake at all?)
This comes from a paper (in a good journal for that field) I read where executives could have either of two kinds of severance packages, both kinds, or neither. The regression modeling treated the problem as having three categories, seemingly ignoring the possibility of executives having both kinds of severance packages (which one of the authors confirmed to me is rather common). “But this is a multilabel problem,” I thought as I read it, especially after the author confirmed to me that the two severance packages are not mutually exclusive the way that “7” and “4” are in the MNIST digits.
EDIT
In a simulation, such a multinomial logistic regression approach gives left skewness to the distribution of p-values when the null hypothesis is true. This does not paint a rosy picture for such an approach, even if it is not absoutely emphatic evidence that such an approach is highly problematic.
library(ggplot2)
library(lmtest)
library(VGAM)
library(nnet)
set.seed(2023)
N <- 10000
B <- 1000
x <- runif(N, 0, 1)
ps_mlr <- rep(NA, B)
for (i in 1:B){
Simulate two categorical variables (each is a type of severance package, not
necessarily mutually exclusive)
y2 <- rbinom(N, 1, 0.5)
y3 <- rbinom(N, 1, 0.5)
Create multinomial-type data from the multi-label
y <- cbind(rep(1, N), rep(0, N), rep(0, N))
for (j in 1:N){
if (y2[j] == 1){
y[j, c(1, 2)] <- c(0, 1)
}
if (y3[j] == 1){
y[j, c(1, 3)] <- c(0, 1)
}
}
Fit full multinomial logistic regression
L1 <- nnet::multinom(y ~ x)
Fit null model
L0 <- nnet::multinom(y ~ 1)
Likelihood ratio test the full multinomial logistic regression
Store p-value
ps_mlr[i] <- lmtest::lrtest(L0, L1)$Pr(>Chisq)[2]
}
d_1 <- data.frame(
p = ps_mlr,
CDF = ecdf(ps_mlr)(ps_mlr),
Group = "Null Hypothesis is True"
)
d_2 <- data.frame(
p = seq(0, 1, 0.001),
CDF = ecdf(seq(0, 1, 0.001))(seq(0, 1, 0.001)),
Group = "Theoretical Distribution of p-values"
)
d <- rbind(d_1, d_2)
ggplot(d, aes(x = p, y = CDF, col = Group)) +
geom_line()
