54

What is the derivative of the ReLU activation function defined as:

$$ \mathrm{ReLU}(x) = \mathrm{max}(0, x)$$

What about the special case where there is a discontinuity in the function at $x=0$?

Tom Hale
  • 2,561

1 Answers1

62

The derivative is:

$$ f(x)= \begin{cases} 0 & \text{if } x < 0 \\ 1 & \text{if } x > 0 \\ \end{cases} $$

And undefined in $x=0$.

The reason for it being undefined at $x=0$ is that its left- and right derivative are not equal.

Jim
  • 2,152
  • 4
    So in practice (implementation), one just picks either $0$ or $1$ for the $x=0$ case? – Tom Hale Mar 14 '18 at 09:51
  • 8
    The convention is that drdx=1(x>0) – neuroguy123 Mar 14 '18 at 13:10
  • 8
    @TomHale by the way, see Nouroz Rahman's answer at https://www.quora.com/How-do-we-compute-the-gradient-of-a-ReLU-for-backpropagation: "[...] In my view, in built-in library functions (for example: tf.nn.relu()) derivative at x = 0 is taken zero to ensure a sparser matrix..." – Jim Mar 29 '18 at 16:17