I see image-classification models from torchvision package don't have a softmax layer as final layer. For instance, the following snippet easily shows that the resnet18 output doesn't have a sum = 1, thus the softmax layer is certainly absent.
from torchvision import models
import torch
model = models.resnet18(pretrained=False)
x = torch.rand(8,3,200,200)
y = model(x)
print(y.sum(dim=1))
So, the question is, why pytorch vision does not put a softmax layer in the end? And how much putting a softmax layer can improve performance? And why?
yto the class labels? What is the loss function that you will use to train this model? In NNs, Softmax is nearly synonymous with classification, but there are lots of ways to train models to learn something about classes that are not, themselves, classification networks, because they are learning a representation, e.g. [tag:triplet-loss]. Likewise, there are alternatives to softmax for classification. Comparing logits and probits is one example: https://stats.stackexchange.com/questions/20523/difference-between-logit-and-probit-models/30909#30909 – Sycorax Aug 31 '21 at 14:35