2

I was trying to train a Neural network ReLus to learn to learn to predict boolean functions (with +1, -1 instead of 0,1). In particular I tried the parity function which is just the product function of its coordinates. i.e. $f(x) = \prod^n_{i=1} x_i$

Just for fun, I tried training the network with the l2 loss (instead of the softmax). I get that the training error is about $10^{-6}$ but I get that the accuracy (0-1 loss) is slightly bellow chance $0.49$. How is this possible? I know the two are not equivalent but I thought that this was super odd. Does someone know why this is happening? Is it possible to have an extremely low l2 loss but still have a bad accuracy?


I know I could just train things with a softmax (logistic func) and the cross-entropy loss to make the modeling more accurate, but I thought that the above result was super odd. Does someone know what is going on? Is it possible to have an extremely low l2 loss but still have a bad accuracy?


Like I a aware that the loss function essentially decides what is our target function. For example, 0-1 loss leads to bayes decision rule and l2 loss I think gives something like:

$$ f(x) = E[Y \mid X=x]$$

but even if its not the perfect modeling, I expect l2 loss to at least be better than chance...at least by a little bit. Specially in this case since being closer -1 or +1 definitively means at least when we get the sign we are closer to the right target value.

1 Answers1

1

In the example below, I am able to score chance-level accuracy at a particular threshold (needed to convert your continuous predictions to discrete categories) despite what looks like a low square loss.

# Taken from: https://stats.stackexchange.com/a/46525/247274

library(pROC) library(MLmetrics) set.seed(2023) N <- 1000 x1 <- rnorm(N) # some continuous variables x2 <- rnorm(N) z <- 1 + 2x1 + 3x2 # linear combination with a bias pr <- 1/(1 + exp(-z)) # pass through an inv-logit function y <- rbinom(N, 1, pr) # Bernoulli outcome variable L <- glm(y ~ x1 + x2, family = "binomial") preds <- 1/(1 + exp(-predict(L))) r <- pROC::roc(y, preds) thresholds <- r$thresholds accuracies <- rep(NA, length(thresholds)) for (i in 1:length(thresholds)){

idx1 <- which(preds > r$thresholds[i])

yhat <- rep(0, N) yhat[idx1] <- 1 accuracies[i] <- MLmetrics::Accuracy(yhat, y)

} plot(thresholds, accuracies) abline(h = mean(y)) mean(y) # 0.593, so the chance-level accuracy is 59.3% accuracies[811] # 0.593, which is chance-level accuracy thresholds[811] # 0.9868599 is the threshold giving chance-level accuracy mean((y - preds)^2) # 0.08799204 looks pretty low to me r$auc # Area under the curve: 0.9468

Overall, this logistic regression model, along with a decision rule of using $0.9868599$ as the threshold for categorizing outcomes, scores chance-level accuracy. However, the square loss of the logistic regression model seems rather low at $0.08799204$, and the ROCAUC is quite high at $0.9468$.

Consequently, I am inclined to believe your results to be possible.

Dave
  • 62,186