The optimal threshold depends on your costs of mistakes. If it is "terrible "to call a $0$ a $1$ but merely "bad" to call a $1$ a $0$, then you might be inclined to favor $0$ as a prediction, requiring a very high probability of $1$, say $0.95$, before you would predict a $1$, even with perfect class balance. It could be that there is a third action to be taken (such as "unsure...collect more data"), even if the observed outcomes are binary, as is argued here and here.
It appears that this does not have to happen, as a simulation shows.
library(ModelMetrics)
set.seed(1998) # Yes, 1998 was 24 years ago,
# but 1998 gave me perfect class balance
N <- 1000
x <- runif(N, -2, 2)
z <- x
pr <- 1/(1 + exp(-z))
y <- rbinom(N, 1, pr)
ybar <- mean(y)
L <- glm(y ~ x, family = binomial)
probs <- 1/(1 + exp(-predict(L)))
categories <- round(probs)
ybar
youden <- function(y, yhat, threshold){
return(
ModelMetrics::recall(y, yhat, threshold)
+
ModelMetrics::tnr(y, yhat, threshold)
-
1
)
}
thresholds <- seq(0.01, 0.99, 0.01)
youdens <- rep(NA, length(thresholds))
for (i in 1:length(thresholds)){
youdens[i] <- youden(y, probs, threshold = thresholds[i])
}
plot(thresholds, youdens)
abline(v = ybar)
The threshold that optimizes Youden's criterion is not $0.5$, despite the perfect class balance in y (given by ybar=0.5. If you fiddle with that simulation, you will see that the prior probability (ybar) need not be the threshold that optimizes Youden's criterion. Try that simulation for x <- runif(N, -2, 5) to get a different prior probability, and you will see that the threshold giving optimal the Youden criterion is near but not exactly the prior probability. This makes me think that, for question #3, nothing changes.
I will leave the two usual blog posts to which I like to link when class imbalance comes up on here. The author, Vanderbilt's Frank Harrell, advocates for direct assessment of the probability outputs, not prematurely converting those probability outputs to discrete categories.
https://www.fharrell.com/post/classification/
https://www.fharrell.com/post/class-damage/