Why is ReLU setting negative values to zero particularly?

Question

I want to understand the logic behind keeping ReLU as $max(0,x)$ and not $min(0,x)$?

Why do we prefer positive inputs over the negative ones?

I don’t think it should matter to use maximum or minimum, since using minimum should be able to get the same outputs but with the weights flipped in signs. I am curious about setting the cutoff at zero, however. Using $\max{1,x}$ would mean that we could get the same output by changing the bias, but what consequences would there be for, e.g., numerical optimization or convergence speed. — Dave, Jan 05 '23 at 17:03

score 6 · Answer 1 · answered Apr 01 '17 at 06:48

6

The weights learned in a neural network can be both positive and negative. So in effect, either form would work. Negating the input and output weights with the $\min$ form gives the same function as with the $\max$ form. The max form is used purely by convention.

answered Apr 01 '17 at 06:48

AaronDefazio

1,614

Can I keep it as $x$ only? Sparsity can anyway be induced by dropout. (PS Ignoring the non-linearity that $max$ or $min$ form would introduce in the system) – jsdbt Apr 01 '17 at 07:23
4

Without non-linearity, your network will compute just some linear function. No need to make it deep or anything. – Yuval Filmus Apr 01 '17 at 12:54
1

This doesn't actually answer the question; why are we choosing activations functions that essentially kill negative values (this include ReLU/GeLU et al)? Saying that we need non-linear activations functions isn't an answer, as there exist an infinite number of activation functions that are differentiable, etc. – Vishal Jan 05 '23 at 17:40
1

You only 'kill negative values' if you have a bias term of zero and positive weights. More accurately, you limit the range of the input input on either side, at some value that can be learned through training. – Frans Rodenburg Jan 12 '23 at 16:04

Why is ReLU setting negative values to zero particularly?

1 Answers1