when I read it I came up with the same question, and this is the answer that I gave to myself.
The assumption came from the fact that we want to avoid that small values of z will cause the maximum likelihood (as sum of the log of each probability) to underflow/overflow: in fact if the probability of one of the two event would be close to zero the log would go towards minus infinity.
So to avoid this, we start defining how to represent the log probabilities instead of the probability itself, and this are chosen to be linear in y and z.
This is reasonable since the Bernoulli can assume only two value, 1 and 0: in fact this linearity consist only in giving a log probability equal to 0 to the output y=0 and a value equal to z to the output y=1. In other words, we are putting constraints just in the way we represent the z, which is also our degree of freedom.
The latter normalization then fix the constraint related to the unitary sum of the two probability.
I hope this helped a bit!