Is it possible to have a extremely low l2-loss while having accuracy at chance level?

Question

I was trying to train a Neural network ReLus to learn to learn to predict boolean functions (with +1, -1 instead of 0,1). In particular I tried the parity function which is just the product function of its coordinates. i.e. $f(x) = \prod^n_{i=1} x_i$

Just for fun, I tried training the network with the l2 loss (instead of the softmax). I get that the training error is about $10^{-6}$ but I get that the accuracy (0-1 loss) is slightly bellow chance $0.49$. How is this possible? I know the two are not equivalent but I thought that this was super odd. Does someone know why this is happening? Is it possible to have an extremely low l2 loss but still have a bad accuracy?

I know I could just train things with a softmax (logistic func) and the cross-entropy loss to make the modeling more accurate, but I thought that the above result was super odd. Does someone know what is going on? Is it possible to have an extremely low l2 loss but still have a bad accuracy?

Like I a aware that the loss function essentially decides what is our target function. For example, 0-1 loss leads to bayes decision rule and l2 loss I think gives something like:

$$ f(x) = E[Y \mid X=x]$$

but even if its not the perfect modeling, I expect l2 loss to at least be better than chance...at least by a little bit. Specially in this case since being closer -1 or +1 definitively means at least when we get the sign we are closer to the right target value.

Dave · Answer 1 · 2023-07-09T20:04:07.957

In the example below, I am able to score chance-level accuracy at a particular threshold (needed to convert your continuous predictions to discrete categories) despite what looks like a low square loss.

# Taken from: https://stats.stackexchange.com/a/46525/247274
library(pROC)
library(MLmetrics)
set.seed(2023)
N <- 1000
x1 <- rnorm(N)           # some continuous variables 
x2 <- rnorm(N)
z <- 1 + 2x1 + 3x2     # linear combination with a bias
pr <- 1/(1 + exp(-z))    # pass through an inv-logit function
y <- rbinom(N, 1, pr)    # Bernoulli outcome variable
L <- glm(y ~ x1 + x2, family = "binomial")
preds <- 1/(1 + exp(-predict(L))) 
r <- pROC::roc(y, preds)
thresholds <- r$thresholds
accuracies <- rep(NA, length(thresholds))
for (i in 1:length(thresholds)){
idx1 <- which(preds > r$thresholds[i])
yhat <- rep(0, N)
  yhat[idx1] <- 1
  accuracies[i] <- MLmetrics::Accuracy(yhat, y)
}
plot(thresholds, accuracies)
abline(h = mean(y))
mean(y)             # 0.593, so the chance-level accuracy is 59.3%
accuracies[811]     # 0.593, which is chance-level accuracy
thresholds[811]     # 0.9868599 is the threshold giving chance-level accuracy
mean((y - preds)^2) # 0.08799204 looks pretty low to me
r$auc               # Area under the curve: 0.9468

Overall, this logistic regression model, along with a decision rule of using $0.9868599$ as the threshold for categorizing outcomes, scores chance-level accuracy. However, the square loss of the logistic regression model seems rather low at $0.08799204$, and the ROCAUC is quite high at $0.9468$.

Consequently, I am inclined to believe your results to be possible.

Is it possible to have a extremely low l2-loss while having accuracy at chance level?

1 Answers1