Is there an accurate approximation of the acos() function?

Question

I need an acos() function with double precision within a compute shader. Since there is no built-in function of acos() in GLSL with double precision, I tried to implement my own.

At first, I implemented a Taylor series like the equation from Wiki - Taylor series with precalculated faculty values. But that seems to be inaccurate around 1. The maximum error was something about 0.08 with 40 iterations.

I also implemented this method which works very well on CPU with a maximum error of -2.22045e-16, but I have some troubles to implement this within the shader.

Currently, I am using an acos() approximation function from here where somebody posted his approximation functions on this site. I am using the most accurate function of this site and now I get a maximum error of -7.60454e-08, but also that error is too high.

My code of this function is:

double myACOS(double x)
{
    double part[4];
    part[0] = 32768.0/2835.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+sqrt(2.0+2.0*x))));
    part[1] = 256.0/135.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+2.0*x)));
    part[2] = 8.0/135.0*sqrt(2.0-sqrt(2.0+2.0*x));
    part[3] = 1.0/2835.0*sqrt(2.0-2.0*x);
    return (part[0]-part[1]+part[2]-part[3]);
}

Does anybody know another implementation method of acos() which is very accurate and -if possible- easy to implement in a shader?

Some system information:

Nvidia GT 555M
running OpenGL 4.3 with optirun

why do you need acos? if it is for slerp you can divide and conquer with repeated lerps — ratchet freak, Mar 10 '15 at 16:36
there is a standard [acos](http://www.cplusplus.com/reference/cmath/acos/) in `` — NathanOliver, Mar 10 '15 at 16:36
Holy crap, use a lookup table if you need that many `sqrt`s. — Andon M. Coleman, Mar 10 '15 at 16:36
@AndonM.Coleman I would use one, but this was just for the proof of concept. And as I said accuracy is more important then performance. — DanceIgel, Mar 10 '15 at 16:39
I'm wondering whether these approximations are actually faster... Did you ran any benchmarks? — Willem Van Onsem, Mar 10 '15 at 16:43
@CommuSoft I am not interested in the performance, so I didn't. I just tested the accuracy on the CPU against the implementation of acos() from math.h. — DanceIgel, Mar 10 '15 at 16:45
What are you using the returned angle for 9 out of 10 you can get to the result you want without acos — ratchet freak, Mar 10 '15 at 17:24
What's the range of arguments you need to support? [-1.0, 1.0]? — Reto Koradi, Mar 10 '15 at 17:25
@ratchetfreak To solve a complex equation in a kind of MLS algorithm. — DanceIgel, Mar 10 '15 at 17:31
What's your target hardware then? GPU with Compute 1.3 or higher - NVIDIA cards GT200 and up, support double precision arithmetic. I assume these functions have double precision too. — Robinson, Mar 10 '15 at 17:48
@Robinson I added the information to the question. In OpenCL and CUDA are double precision functions for 'acos()', but not within a shader. Unfortunately I am a bit restricted to compute shaders. — DanceIgel, Mar 10 '15 at 17:54
"I have some troubles to implement this within the shader."...such as? — genpfault, Mar 10 '15 at 19:57
Perhaps, (1) asin(x) = x + 1/2 (x^3/3) + (1/2)(3/4)(x^5/5) + (1/2)(3/4)(5/6)(x^7/7) + ... (2) acos(x) = (pi/2 - asin(x)). As a quick test, I got asin = 0.89726889283942934 with the stdlib giving 0.90384998229123237. For acos(x) I got 0.67415967858914205 with the stdlib giving 0.66694634450366430. Not sure how many iterations for it to converge to the kind of precision you want. — Robinson, Mar 10 '15 at 23:28
To add, there's a different way of calculating it around 1 because as you say it's inaccurate. There's a question here about it: http://stackoverflow.com/questions/20196000/own-asin-function-with-taylor-series-not-accurate — Robinson, Mar 11 '15 at 10:05
@Robinson Your first comment is exactly the Taylor series Ialready implemented. But i will have a look at the other question. — DanceIgel, Mar 11 '15 at 10:13

score 6 · Answer 1 · edited May 23 '17 at 11:55

The NVIDIA GT 555M GPU is a device with compute capability 2.1, so there is native hardware support for basic double-precision operations, including fused multipy-add (FMA). As with all NVIDIA GPUs, the square root operation is emulated. I am familiar with CUDA, but not GLSL. According to version 4.3 of the GLSL specification, it exposes double-precision FMA as a function fma() and provides a double-precision square root, sqrt(). It is not clear whether the sqrt() implementation is correctly rounded according to IEEE-754 rules. I will assume it is, by analogy with CUDA.

Instead of using a Taylor series, one would want to use a polynomial minimax approximation, thereby reducing the number of terms required. Minimax approximations are typically generated using a variant of the Remez algorithm. To optimize both speed and accuracy, the use of FMA is essential. Evaluation of the polynomial with the Horner scheme is condusive to high accruacy. In the code below, a second order Horner scheme is utilized. As in DanceIgel's answer, acos is conveniently computed using an asin approximation as the basic building block in conjunction with standard mathematical identities.

With 400M test vectors, the maximum relative error seen with the code below was 2.67e-16, while the maximum ulp error observed is 1.442 ulp.

/* compute arcsin (a) for a in [-9/16, 9/16] */
double asin_core (double a)
{
    double q, r, s, t;

    s = a * a;
    q = s * s;
    r =             5.5579749017470502e-2;
    t =            -6.2027913464120114e-2;
    r = fma (r, q,  5.4224464349245036e-2);
    t = fma (t, q, -1.1326992890324464e-2);
    r = fma (r, q,  1.5268872539397656e-2);
    t = fma (t, q,  1.0493798473372081e-2);
    r = fma (r, q,  1.4106045900607047e-2);
    t = fma (t, q,  1.7339776384962050e-2);
    r = fma (r, q,  2.2372961589651054e-2);
    t = fma (t, q,  3.0381912707941005e-2);
    r = fma (r, q,  4.4642857881094775e-2);
    t = fma (t, q,  7.4999999991367292e-2);
    r = fma (r, s, t);
    r = fma (r, s,  1.6666666666670193e-1);
    t = a * s;
    r = fma (r, t, a);

    return r;
}

/* Compute arccosine (a), maximum error observed: 1.4316 ulp
   Double-precision factorization of π courtesy of Tor Myklebust
*/
double my_acos (double a)
{
    double r;

    r = (a > 0.0) ? -a : a; // avoid modifying the "sign" of NaNs
    if (r > -0.5625) {
        /* arccos(x) = pi/2 - arcsin(x) */
        r = fma (9.3282184640716537e-1, 1.6839188885261840e+0, asin_core (r));
    } else {
        /* arccos(x) = 2 * arcsin (sqrt ((1-x) / 2)) */
        r = 2.0 * asin_core (sqrt (fma (0.5, r, 0.5)));
    }
    if (!(a > 0.0) && (a >= -1.0)) { // avoid modifying the "sign" of NaNs
        /* arccos (-x) = pi - arccos(x) */
        r = fma (1.8656436928143307e+0, 1.6839188885261840e+0, -r);
    }
    return r;
}

score 3 · Accepted Answer · edited May 23 '17 at 10:31

My current accurate shader implementation of 'acos()' is a mix out of the usual Taylor series and the answer from Bence. With 40 iterations I get an accuracy of 4.44089e-16 to the 'acos()' implementation from math.h. Maybe it is not the best, but it works for me:

And here it is:

double myASIN2(double x)
{
    double sum, tempExp;
    tempExp = x;
    double factor = 1.0;
    double divisor = 1.0;
    sum = x;
    for(int i = 0; i < 40; i++)
    {
        tempExp *= x*x;
        divisor += 2.0;
        factor *= (2.0*double(i) + 1.0)/((double(i)+1.0)*2.0);
        sum += factor*tempExp/divisor;
    }
    return sum;
}

double myASIN(double x)
{
    if(abs(x) <= 0.71)
        return myASIN2(x);
    else if( x > 0)
        return (PI/2.0-myASIN2(sqrt(1.0-(x*x))));
    else //x < 0 or x is NaN
        return (myASIN2(sqrt(1.0-(x*x)))-PI/2.0);

}

double myACOS(double x)
{
    return (PI/2.0 - myASIN(x));
}

Any comments, what could be done better? For example using a LUT for the values of factor, but in my shader 'acos()' is just called once so there is no need for it.

Is there an accurate approximation of the acos() function?

2 Answers2