Deep Learning Book - deriving sigmoid units for Bernoulli output

Question

In the paragraph before equation 6.20, the book says:

"...If we begin with the assumption that the unnormalized log probabilities are linear in $y$ and $z$, we can exponentiate to obtain the unnormalized probabilities..."

With this assumption, we proceed to equation 6.20, and onwards to derive the sigmoid output unit.

My question is, why is assumption reasonable? The book seem to provide no justification.

when you say "equation 6.20," we also need the title of the book. Even better would be to reproduce the equation using $\LaTeX$ — Taylor, Feb 08 '17 at 05:30
Also the answer is: because exponentiating undoes the log, giving you regular unnormalized probabilities — Taylor, Feb 08 '17 at 05:32

score 0 · Answer 1 · answered Feb 16 '17 at 13:42

when I read it I came up with the same question, and this is the answer that I gave to myself.

The assumption came from the fact that we want to avoid that small values of z will cause the maximum likelihood (as sum of the log of each probability) to underflow/overflow: in fact if the probability of one of the two event would be close to zero the log would go towards minus infinity. So to avoid this, we start defining how to represent the log probabilities instead of the probability itself, and this are chosen to be linear in y and z.

This is reasonable since the Bernoulli can assume only two value, 1 and 0: in fact this linearity consist only in giving a log probability equal to 0 to the output y=0 and a value equal to z to the output y=1. In other words, we are putting constraints just in the way we represent the z, which is also our degree of freedom. The latter normalization then fix the constraint related to the unitary sum of the two probability.

I hope this helped a bit!

This part I do not get, if we assign a log probability of 0 to $y=0$ then the log probability of $y=1$ should be 1 since $Y$ can not assume other values, what am I missing ? — zebullon, Nov 20 '17 at 04:11
We assign 0 to the unnormalized log probability of 0. Sorry that I didn't specify it — giubacchio, Nov 20 '17 at 11:33

Deep Learning Book - deriving sigmoid units for Bernoulli output

1 Answers1