Questions tagged [softmax]

Normalizing exponential function which transforms a numeric vector such that all its entries become between 0 and 1 and together sum to 1. It is often used as the final layer of a neural network performing a classification task.

229 questions
9
votes
1 answer

What is the gradient-log-normalizer?

In the wiki, the softmax function is defined as the gradient-log-normalizer of the categorical probability distribution. A partial explanation to the log-normalizer is found here, but what does gradient-log-normalizer stand for?
tashuhka
  • 607
  • 1
  • 7
  • 17
8
votes
3 answers

Soft version of the maximum function?

In the book Deep Learning, it says the softmax function is de facto a soft argmax function, and the corresponding soft version of the maximum function is $$\text{softmax} (z)^T z$$ How to understand the latter?
JP Zhang
  • 241
  • 2
  • 11
4
votes
2 answers

Softmax variant based on rank of vector indices

Given a $K$-dimensional real-valued vector $\mathbf{z} = (z_1, z_2, \ldots, z_K)$, I know that the softmax function returns a vector $\sigma(\mathbf{z})$ with positive elements summing to 1 via the following formula: $$ \sigma(\mathbf{z})_j =…
josliber
  • 4,367
  • 28
  • 43
2
votes
1 answer

Why does deriving the softmax for a single vector come to 0 for me?

There is a lot of material explaining how to calculate the jacobian for the softmax backwards pass, but I find it confusing how to get to the actual errors from the jacobian. The obvious answer would be to sum up the either rows or the columns - it…
2
votes
1 answer

How does softmax relate/backpropagate probabilities with binary label?

When the softmax is the last layer in the neural network, it usually takes input from a fully connected layer (say of size 1000) and then outputs probabilities of each of this output (if I understand it correctly). Assuming that our task is to…
cerebrou
  • 275
1
vote
1 answer

Confusion with softmax

I was wondering if someone could explain why, if I do softmax on [683, 861, 981, 834] I get [3.80403403e-130 7.66764807e-053 1.00000000e+000 1.44115655e-064] But if I take a factor of 100 out: [6.832, 8.61, 9.81, 8.34] then I get…
BigBadMe
  • 434
1
vote
1 answer

Softmax where the max probability is less than one

I have a neural network with a softmax at the end. Something like this: def forward(self, x) x = self.conv(x) x = self.channel_transform_layer(x) output = self.softmax(x) return output I would like the maximum value of the logits to…
0
votes
0 answers

Classification in the context of a time series

I would like to exchange with you about how to best do this. Let me try to clearly describe the situation in the most non-verbose way I can! (Stack: Python, Darts (from unit8), pandas, numpy) I have a building temperature time series, I perform…
MDSv
  • 33
0
votes
2 answers

Softmax derivation - case i==j

I am reading this article: https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative about softmax derivation w.r.t. to the input. Please confirm my understanding of the case when j==i: \begin{equation} DjSi=\begin{cases} …
artona
  • 125
  • 4