1

I know that for big datasets we should try to consider calculations effort and try to minimize execution speed if it does not harm quality.

In many models, like regression, neural network, probably somewhere else, sigmoid function is used as a cost function. Input values in range (- infinity, -4] give output 0.01 or less. Input values in range [4, infinity) give output 0.99 or more.

Could we just map such values to 0.01 or 0.99 directly to improve calculation speed?

Does it make sense to improve performance against some inaccuracy in cost function?

Denis
  • 13

1 Answers1

1

If you're using a numerically stable version of the sigmoid function, some version of your proposal is already done to prevent overflow. In the function

$$f(x) = \frac{1}{\exp(-x)+1},$$ very small values of $x$ can cause numerical overflow. So to remediate that, a sigmoid implementation might look something like

def sigmoid(x):
    if x < -20.0:
         return 0.0
    else:
         return 1.0 / (exp(-x) + 1.0)

We don't need to worry about the case of large $x$, because if $x$ is too large, then $\exp(-x)$ becomes zero and we simply have $\frac{1}{0+1}=1$.

The value of -20 is chosen to be close to where overflow would result in a NaN return. A different choice could be more appropriate depending on the floating point precision and the particular usage. In particular, we might want to pick a well-chosen value to preclude erratic behavior near the cusp of loss of precision.

The purpose is not to conserve compute time, because the difference between 0.01 and 0.001 can be very important to your computation. Throwing away that precision could give bogus results, such as stopping a gradient-based method in its tracks because the gradient suddenly becomes zero. Whether or not it's a good idea to compromise your computation's precision to get a small performance increase should be decided on a case-by-case basis, since the cost of imprecision could be very high in one instance but negligible in another.

Sycorax
  • 90,934