Drawbacks of Newton-Raphson approximation with approximate numerical derivative

Question

Suppose I have some function $f$ and I want to find $x$ such that $f(x)\approx 0$. I might use the Newton-Raphson method. But this requires that I know the derivative function $f'(x)$. An analytic expression for $f$ may be unavailable. For example, $f$ may be defined by a complicated piece of computer code that consults a database of experimental values.

But even if $f'$ is complicated, I can approximate $f'(a)$ for any particular $a$ by choosing a small number $\epsilon$ and calculting $f'(a) \approx {f(a+\epsilon) - f(a)\over\epsilon}$.

I have heard that there are distinct disadvantages to this approach, but I don't know what they are. Wikipedia hints that "Using this approximation would result in something like the secant method whose convergence is slower than that of Newton's method."

Can someone please elaborate on this, and provide a reference that particularly discusses the problems with this technique?

The secant method is an excellent alternative when the derivative is expensive to compute. Three steps secant is generally roughly equivalent to two steps Newton, and steps are cheaper. — , Jun 17 '12 at 18:35
Whenever you calculate a derivative numerically by finite difference (as you are suggesting), any noise in the function is amplified, so you must choose your epsilon carefully. One possibility is, when you get close to the solution, switch to a binary subdivision method, that's guaranteed to converge as long as f is locally monotonic. — Mike Dunlavey, Jun 20 '12 at 12:56
As mentioned by André, two-point numerical derivatives, as you suggest, are equivalent to a restarted Secant method. For faster convergence, though, I would suggest the so-called Illinois algorithm, which is a close relative of the Secant method and will use only one point per step, as opposed to two in your case, and won't get stuck like the False position method. — Pedro, Jun 20 '12 at 13:10
What is the dimension of $x$? The higher the dimension, the more valuable a derivative becomes. Jacobian-free Newton-Krylov is an option that does not need explicit derivatives (though preconditioning is important for ill-conditioned systems). — Jed Brown, Jun 22 '12 at 06:07

Geoff Oxberry · Accepted Answer · 2012-06-20T13:47:20.397

For the sake of notation, let's suppose that $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ (i.e., it's a vector-valued function that takes a vector as input and outputs a vector of the same size). There are two concerns: computational cost and numerical accuracy.

Calculating the derivative $\mathrm{D}f(x)$ (the Jacobian matrix, $J(x)$, or $(\nabla f(x))^{T}$, or whatever you prefer) using finite differences is going to require $n$ function evaluations. If you could calculate the derivative using floating point arithmetic directly from the definition, you would have to calculate the difference quotient

\begin{align} \mathrm{D}f(x)e_{i} = \lim_{\varepsilon \rightarrow 0} \frac{f(x + \varepsilon e_{i}) - f(x)}{\varepsilon} \end{align}

for each $i = 1, \ldots, n$, assuming you don't do any sort of "smart finite differencing" (like Curtis-Powell-Reid) because you know (or can detect) the sparsity pattern of $\mathrm{D}f$. If $n$ is large, that could be a lot of function evaluations. If you have an analytical expression for $\mathrm{D}f$, then calculating it could be cheaper. Automatic (also known as algorithmic) differentiation methods can also be used in some cases to calculate $\mathrm{D}f$ at roughly 3 to 5 times the cost of a function evaluation.

There are also numerical concerns. Obviously, on a computer, we can't take the limit of a scalar as it goes to zero, so when we approximate $\mathrm{D}f$, we're really picking $\varepsilon$ to be "small" and calculating

\begin{align} \mathrm{D}f(x)e_{i} \approx \frac{f(x + \varepsilon e_{i}) - f(x)}{\varepsilon}, \end{align}

where $\approx$ means it's an approximation, and we hope it's a really good approximation. Calculating this approximation in floating point arithmetic is tough because if you pick $\varepsilon$ too large, your approximation could be bad, but if you pick $\varepsilon$ too small, there could be significant rounding error. These effects are covered in the Wikipedia article on numerical differentiation in superficial detail; more detailed references can be found within the article.

If the error in the Jacobian matrix $\mathrm{D}f$ isn't too large, Newton-Raphson iterations will converge. For a detailed theoretical analysis, see Chapter 25 of Accuracy and Stability of Numerical Algorithms by Nick Higham, or the paper by Françoise Tisseur on which it is based.

Libraries generally take care of these algorithmic details for you, and usually, library implementations of the Newton-Raphson algorithm (or variants thereof) will converge quite nicely, but every so often, there will be a problem that causes some trouble due to the drawbacks above. In the scalar case $(n = 1)$, I'd use Brent's method, owing to its robustness and good convergence rate in practice.

score 3 · Answer 2 · answered Sep 11 '20 at 03:21

In the 1-dimensional case, there are three main methods which are similar to what you are describing which come to my mind.

The secant method uses the derivative approximation

$$f'(x_n)\approx\frac{f(x_n)-f(x_{n-1})}{x_n-x_{n-1}}$$

to compute the next iteration. As André Nicolas points out, the secant method is quite desirable for its fast convergence and fast evaluations, especially when evaluating more functions (either $f'(x_n)$ or $f(x_n+\epsilon)$) may be very slow.

Steffensen's method is another method using derivative approximations. It computes $f(x_n+\epsilon)$ for $\epsilon=f(x_n)$ on each iteration to approximate the derivative. Since $f(x_n)\to0$ is expected (i.e. a root is approaches), the accuracy of this derivative approximation improves as the root is approached, leading to the same asymptotic speed as Newton's method (with or without the consideration of multiple function evaluations per iteration).

More powerful extensions of the secant method are known. Sidi's methods provide generalizations of the secant method based on the following premise:

The secant method approximates $f'(x_n)$ by the slope of the line going through $x_n$ and $x_{n-1}$. We can extend this by approximating $f'(x_n)$ by the derivative of the quadratic (or $k$ degree polynomial) going through $x_n$, $x_{n-1}$, and $x_{n-2}$ (or up to $x_{n-k}$).

These methods recover more of the asymptotic speed of Newton's method while still remaining true to the advantages of the secant method, requiring no additional function evaluations per iteration.

Overall, the main disadvantage is reduced speed per iteration. Whether or not these methods are faster than Newton's method depends fully on the nature of how the derivative is provided. If the derivative is significantly easier to evaluate than $f$ itself, Newton's method may be more desirable. Otherwise, Newton's method may be much slower than the above methods.

Consider for example, using Newton's method to find the root of the solution to the differential equation $\dot y=g(t)y$. In this instance, it can be easily seen that $y/\dot y=1/g(t)$, which makes Newton's method extremely favorable for this problem.

In most problems, however, I tend to find the derivative unprovided. Since at worst the derivative-approximation type methods are only a constant factor slower than Newton's method e.g. secant method is roughly 50% slower than Newton's method, I prefer these methods.

Drawbacks of Newton-Raphson approximation with approximate numerical derivative

2 Answers2