Is it correct to say after passing raw logits from a model inference through a softmax activation layer, I effectively obtain the probability prediction for each class? If so, is it more accurate if I obtain the class prediction using something like np.argmax(predictions, 1) to obtain the index (i.e. the class) of my prediction? I ask this as I have seen some using np.argmax(logits, 1) on the logits instead. Is it entirely the same? Should I use logits or predictions for getting my labels?
To clarify, the model I'm training is a convolutional neural network, and I'm training on images. As I am using TensorFlow, my probability predictions are obtained as such:
logits = fully_connected(...)
probabilities = tf.nn.softmax(logits, name = 'Predictions')
The output I received are as such:
logits:
[[-3.43802428 8.50315285 -2.49437261 -5.31596804 -0.89939809]
[-0.95422858 9.42916107 -3.32421923 -6.13104153 -2.98519015]
[ 8.39374065 -3.62434602 -2.13051629 -2.22547841 -2.52487397]
[-0.59439617 3.04803014 -0.29919145 -3.0748446 -0.47999749]
[ 5.72281456 -4.47873831 -5.08080578 4.947299 -4.22939491]
[-0.96076369 0.75843185 -4.78069353 6.38486814 -2.88907671]
[-4.09765959 -3.3006978 0.0887602 -1.27504754 6.07267427]
[-2.78308058 -0.20948838 -3.07063556 5.19085979 -1.45271111]
[ 0.56546986 -1.42026496 1.7502563 -0.76801473 -0.59683001]
[-0.93040967 -3.98949075 4.72442484 -1.89542389 0.66783226]]
Probabilities:
[[ 1.88053939e-02 7.99235344e-01 7.44791282e-03 3.45632341e-03
1.71054900e-01]
[ 6.08554437e-05 5.32172574e-03 4.85260319e-03 5.05760590e-06
9.89759743e-01]
[ 2.12738559e-01 5.80604553e-01 9.15991813e-02 8.34812075e-02
3.15765105e-02]
[ 1.14435446e-04 4.85864328e-03 2.51556003e-06 9.95020747e-01
3.63525260e-06]
[ 7.77163683e-03 2.52283569e-02 2.85758870e-03 9.63748634e-01
3.93733033e-04]
[ 2.85534596e-04 5.73577709e-05 8.98985386e-01 1.66590798e-05
1.00655064e-01]
[ 1.65269442e-03 6.96722636e-05 3.91014554e-09 9.98277664e-01
1.67898691e-08]
[ 4.43775773e-01 3.91859515e-03 2.33097732e-01 2.69691706e-01
4.95162793e-02]
[ 9.99397755e-01 5.45497285e-04 2.28157460e-05 2.26806569e-05
1.12535117e-05]
[ 6.52832258e-03 4.78139009e-05 1.22661561e-01 2.00217139e-04
8.70562136e-01]]
predictions:
[4 1 4 3 0 2 2 1 4 1]
and my predictions are obtained using tf.argmax(probabilities, 1), which works similarly to np.argmax. My model accuracy seems to be consistently increasing as I train it, but the output seems to be a bit weird. For instance, in the first row, the highest value for the logits has index 1, but the probabilities for the corresponding logit is not the index 1, but 2. But the predictions shows a class of 4 instead. I am rather confused here.
I am concerned if the training has gone awry but it doesn't appear obvious since there are only 5 classes. My losses have been on a decrease, though (although it is decreasing not so steadily, which could be due to having a small batch size).