8

I want to understand the logic behind keeping ReLU as $max(0,x)$ and not $min(0,x)$?

Why do we prefer positive inputs over the negative ones?

jsdbt
  • 249
  • FYI sometimes it is preferable to use leaky ReLUs – Franck Dernoncourt Apr 01 '17 at 18:13
  • 1
    I don’t think it should matter to use maximum or minimum, since using minimum should be able to get the same outputs but with the weights flipped in signs. I am curious about setting the cutoff at zero, however. Using $\max{1,x}$ would mean that we could get the same output by changing the bias, but what consequences would there be for, e.g., numerical optimization or convergence speed. – Dave Jan 05 '23 at 17:03

1 Answers1

6

The weights learned in a neural network can be both positive and negative. So in effect, either form would work. Negating the input and output weights with the $\min$ form gives the same function as with the $\max$ form. The max form is used purely by convention.

AaronDefazio
  • 1,614
  • Can I keep it as $x$ only? Sparsity can anyway be induced by dropout. (PS Ignoring the non-linearity that $max$ or $min$ form would introduce in the system) – jsdbt Apr 01 '17 at 07:23
  • 4
    Without non-linearity, your network will compute just some linear function. No need to make it deep or anything. – Yuval Filmus Apr 01 '17 at 12:54
  • 1
    This doesn't actually answer the question; why are we choosing activations functions that essentially kill negative values (this include ReLU/GeLU et al)? Saying that we need non-linear activations functions isn't an answer, as there exist an infinite number of activation functions that are differentiable, etc. – Vishal Jan 05 '23 at 17:40
  • 1
    You only 'kill negative values' if you have a bias term of zero and positive weights. More accurately, you limit the range of the input input on either side, at some value that can be learned through training. – Frans Rodenburg Jan 12 '23 at 16:04