Should I use the logits or the scaled probabilities from them to extract my predictions?

Question

Is it correct to say after passing raw logits from a model inference through a softmax activation layer, I effectively obtain the probability prediction for each class? If so, is it more accurate if I obtain the class prediction using something like np.argmax(predictions, 1) to obtain the index (i.e. the class) of my prediction? I ask this as I have seen some using np.argmax(logits, 1) on the logits instead. Is it entirely the same? Should I use logits or predictions for getting my labels?

To clarify, the model I'm training is a convolutional neural network, and I'm training on images. As I am using TensorFlow, my probability predictions are obtained as such:

logits = fully_connected(...)
probabilities = tf.nn.softmax(logits, name = 'Predictions')

The output I received are as such:

logits: 
[[-3.43802428  8.50315285 -2.49437261 -5.31596804 -0.89939809]
 [-0.95422858  9.42916107 -3.32421923 -6.13104153 -2.98519015]
 [ 8.39374065 -3.62434602 -2.13051629 -2.22547841 -2.52487397]
 [-0.59439617  3.04803014 -0.29919145 -3.0748446  -0.47999749]
 [ 5.72281456 -4.47873831 -5.08080578  4.947299   -4.22939491]
 [-0.96076369  0.75843185 -4.78069353  6.38486814 -2.88907671]
 [-4.09765959 -3.3006978   0.0887602  -1.27504754  6.07267427]
 [-2.78308058 -0.20948838 -3.07063556  5.19085979 -1.45271111]
 [ 0.56546986 -1.42026496  1.7502563  -0.76801473 -0.59683001]
 [-0.93040967 -3.98949075  4.72442484 -1.89542389  0.66783226]]
Probabilities: 
[[  1.88053939e-02   7.99235344e-01   7.44791282e-03   3.45632341e-03
    1.71054900e-01]
 [  6.08554437e-05   5.32172574e-03   4.85260319e-03   5.05760590e-06
    9.89759743e-01]
 [  2.12738559e-01   5.80604553e-01   9.15991813e-02   8.34812075e-02
    3.15765105e-02]
 [  1.14435446e-04   4.85864328e-03   2.51556003e-06   9.95020747e-01
    3.63525260e-06]
 [  7.77163683e-03   2.52283569e-02   2.85758870e-03   9.63748634e-01
    3.93733033e-04]
 [  2.85534596e-04   5.73577709e-05   8.98985386e-01   1.66590798e-05
    1.00655064e-01]
 [  1.65269442e-03   6.96722636e-05   3.91014554e-09   9.98277664e-01
    1.67898691e-08]
 [  4.43775773e-01   3.91859515e-03   2.33097732e-01   2.69691706e-01
    4.95162793e-02]
 [  9.99397755e-01   5.45497285e-04   2.28157460e-05   2.26806569e-05
    1.12535117e-05]
 [  6.52832258e-03   4.78139009e-05   1.22661561e-01   2.00217139e-04
    8.70562136e-01]]
predictions: 
[4 1 4 3 0 2 2 1 4 1]

and my predictions are obtained using tf.argmax(probabilities, 1), which works similarly to np.argmax. My model accuracy seems to be consistently increasing as I train it, but the output seems to be a bit weird. For instance, in the first row, the highest value for the logits has index 1, but the probabilities for the corresponding logit is not the index 1, but 2. But the predictions shows a class of 4 instead. I am rather confused here.

I am concerned if the training has gone awry but it doesn't appear obvious since there are only 5 classes. My losses have been on a decrease, though (although it is decreasing not so steadily, which could be due to having a small batch size).

I realized this is a TensorFlow mechanic I did not understand. Because I obtained my output from different runs of a session, the output became different. — infomin101, Feb 10 '17 at 18:49

score 7 · Accepted Answer · edited Apr 13 '17 at 12:44

7

It is not exactly clear what kind of model you are referring to, but in case of logistic regression, multinomial logistic regression and similar models, they output probabilities.

As about predictions, notice that if you pass the logits through softmax function then it does not change the relations between values, so if $x_1 < x_2$, then $\DeclareMathOperator{\softmax}{\mathrm{softmax}} \softmax(x_1) < \softmax(x_2)$, so it doesn't really matter what do you compare.

edited Apr 13 '17 at 12:44

Community

1

answered Feb 09 '17 at 15:15

Tim

138,066

2

I would only add that you can lose a little bit of precision when going from logits to probabilities (particularly if you have a probability close to 1). This almost never matters, but is one reason you might use logits. This loss of precision won't change any of the actual predictions, but if you use some sort of a threshold, it could lead to a little inaccuracy near the threshold. – J. O'Brien Antognini Feb 09 '17 at 18:01
Thank you for explaining the relevance of logits to the softmax! Everything becomes clearer now. – infomin101 Feb 10 '17 at 17:07

Should I use the logits or the scaled probabilities from them to extract my predictions?

1 Answers1