I was trying to train a Neural network ReLus to learn to learn to predict boolean functions (with +1, -1 instead of 0,1). In particular I tried the parity function which is just the product function of its coordinates. i.e. $f(x) = \prod^n_{i=1} x_i$
Just for fun, I tried training the network with the l2 loss (instead of the softmax). I get that the training error is about $10^{-6}$ but I get that the accuracy (0-1 loss) is slightly bellow chance $0.49$. How is this possible? I know the two are not equivalent but I thought that this was super odd. Does someone know why this is happening? Is it possible to have an extremely low l2 loss but still have a bad accuracy?
I know I could just train things with a softmax (logistic func) and the cross-entropy loss to make the modeling more accurate, but I thought that the above result was super odd. Does someone know what is going on? Is it possible to have an extremely low l2 loss but still have a bad accuracy?
Like I a aware that the loss function essentially decides what is our target function. For example, 0-1 loss leads to bayes decision rule and l2 loss I think gives something like:
$$ f(x) = E[Y \mid X=x]$$
but even if its not the perfect modeling, I expect l2 loss to at least be better than chance...at least by a little bit. Specially in this case since being closer -1 or +1 definitively means at least when we get the sign we are closer to the right target value.