Highest Voted 'softmax' Questions - Statistical Analysis Stack Exchange

9

votes

1 answer

What is the gradient-log-normalizer?

In the wiki, the softmax function is defined as the gradient-log-normalizer of the categorical probability distribution. A partial explanation to the log-normalizer is found here, but what does gradient-log-normalizer stand for?

softmax

asked Feb 01 '16 at 12:46

tashuhka

607
1
7
17

8

votes

3 answers

Soft version of the maximum function?

In the book Deep Learning, it says the softmax function is de facto a soft argmax function, and the corresponding soft version of the maximum function is $$\text{softmax} (z)^T z$$ How to understand the latter?

softmax

asked Aug 20 '17 at 01:01

JP Zhang

241
2
11

4

votes

2 answers

Softmax variant based on rank of vector indices

Given a $K$-dimensional real-valued vector $\mathbf{z} = (z_1, z_2, \ldots, z_K)$, I know that the softmax function returns a vector $\sigma(\mathbf{z})$ with positive elements summing to 1 via the following formula: $$ \sigma(\mathbf{z})_j =…

softmax

asked Apr 29 '16 at 18:52

josliber

4,367
28
43

2

votes

1 answer

Why does deriving the softmax for a single vector come to 0 for me?

There is a lot of material explaining how to calculate the jacobian for the softmax backwards pass, but I find it confusing how to get to the actual errors from the jacobian. The obvious answer would be to sum up the either rows or the columns - it…

softmax

asked Jan 30 '18 at 12:07

Marko Grdinić

131

2

votes

1 answer

How does softmax relate/backpropagate probabilities with binary label?

When the softmax is the last layer in the neural network, it usually takes input from a fully connected layer (say of size 1000) and then outputs probabilities of each of this output (if I understand it correctly). Assuming that our task is to…

softmax

asked Mar 09 '16 at 07:06

cerebrou

275

1

vote

1 answer

Confusion with softmax

I was wondering if someone could explain why, if I do softmax on [683, 861, 981, 834] I get [3.80403403e-130 7.66764807e-053 1.00000000e+000 1.44115655e-064] But if I take a factor of 100 out: [6.832, 8.61, 9.81, 8.34] then I get…

softmax

asked Oct 11 '19 at 18:17

BigBadMe

434

1

vote

1 answer

Softmax where the max probability is less than one

I have a neural network with a softmax at the end. Something like this: def forward(self, x) x = self.conv(x) x = self.channel_transform_layer(x) output = self.softmax(x) return output I would like the maximum value of the logits to…

softmax

asked May 20 '19 at 13:37

Benjamin Crouzier

577

0

votes

0 answers

Classification in the context of a time series

I would like to exchange with you about how to best do this. Let me try to clearly describe the situation in the most non-verbose way I can! (Stack: Python, Darts (from unit8), pandas, numpy) I have a building temperature time series, I perform…

softmax

asked Oct 23 '23 at 13:58

MDSv

33

0

votes

2 answers

Softmax derivation - case i==j

I am reading this article: https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative about softmax derivation w.r.t. to the input. Please confirm my understanding of the case when j==i: \begin{equation} DjSi=\begin{cases} …

softmax

asked Aug 02 '18 at 19:57

artona

125
4

Questions tagged [softmax]