I'm writing a matrix library in c++. After some debugging I found that a simple double difference is not zero for two "equals" numbers. This is due how double are represented in a computer of course. So my question is, what I could use to improve the numeric stability about subtracting matrix (or vectors) and possibly other basic operations? I tried to use long double but I have the same problem. The same algorithm in MATLAB works so MATLAB must use some trick about internal number representation.
1 Answers
This is comment, but too long for comment so I write here. I think is better you edit the post with more information, now I explain why.
Here you can find some general advice to improve precision.
The problem you find I think it's possible that it come from the floating point representation and the relative effect in the machine operation. When you do an add or a subtraction the machine operation are these: $$ x \oplus y = \mathbf{fl}(\mathbf{fl}(x) + \mathbf{fl}(y))\\ x \ominus y = \mathbf{fl}(\mathbf{fl}(x) - \mathbf{fl}(y)) $$ where $\mathbf{fl}(x)$ is the floating point number of $x$. You find that the error estimates are: $$ \text{err}_{x,y}^{\oplus} \quad \alpha \quad \left| \frac{x}{x+y} \right|\text{err}_x + \left| \frac{y}{x+y} \right|\text{err}_y \\ \text{err}_{x,y}^{\ominus} \quad \alpha \quad \left| \frac{x}{x-y} \right|\text{err}_x + \left| \frac{y}{x-y} \right|\text{err}_y\\ \text{whit} \quad \text{err}_x = \frac{| x - \mathbf{fl}(x) |}{|x|} $$
You can see that if $x+y \approx 0$ in the sum or $x-y \approx 0$ in the subtraction you can have got a accuracy loss.
The principal trick is to avoid potential dangerous subtraction/sum and reformulate the algorithm in numerical stable version. The typical example, that you can find in many textbook, is about the roots of a second degree polynomial. You can write the polynomial $ax^2 + bx + c$ as $x^2 + 2px - q$. In this form, if $\sqrt{p^2 + q} \geq 0$, the roots are: $$ y = -p \pm \sqrt{p^2 + q} $$ Now if $p >>q$ one of the root is potentially dangerous because $\sqrt{p^2 + q} \approx p$ and you obtain a dangerous subtraction.
The solution is to use a different formulation: $$ y = -p + \sqrt{p^2 + q} = \frac{(-p + \sqrt{p^2 + q}) (p + \sqrt{p^2 + q})}{(p + \sqrt{p^2 + q})} = \frac{q}{(p + \sqrt{p^2 + q})} $$ This is a stable version.
Maybe the matlab code uses a more stable version of the algorithm. Try to see if the problem might be due to this (of course, each case must be treated separately) and edit the question. Anyway as @Wrzlprmft suggested edit the question.
Reference
I report a couple of references. In [1] there is chapter 1 about floating point with error propagation, there is also the above example. The book [2] is a good and complete reference about accuracy and stability for many intentions.
[1] Stoer, J.; Bulirsch, R., Introduction to numerical analysis. Translated from the German by R. Bartels, W. Gautschi and C. Witzgall., Texts in Applied Mathematics. 12. New York: Springer-Verlag. xiii, 660 p. (1993). ZBL0771.65002.
[2] Higham, Nicholas J., Accuracy and stability of numerical algorithms., Philadelphia, PA: SIAM. xxx, 680 p. (2002). ZBL1011.65010.
- 1,340
- 1
- 11
- 21
-
Good you took your time for explaining. I wonder whether you could recommend a textbook on the caveats floating point calculations. I am aware of Goldbach's famous paper, but there does not seem to be too much around wrt. floating point issues in numerical linear algebra. – shuhalo Mar 12 '17 at 14:54
-
1
subtract_and_roundinstead ofsubtract, for example). Please [edit] your question to clarify what your actual problem is. Be prepared that your question may be off-topic here (I cannot tell yet) and belong on [so] instead. – Wrzlprmft Mar 12 '17 at 09:05