Why is the function floor giving different results in this case?

Question

In this example, the behaviour of floor differs and I do not understand why:

printf("floor(34000000.535 * 100 + 0.5) : %lf \n", floor(34000000.535 * 100 + 0.5));
printf("floor(33000000.535 * 100 + 0.5) : %lf \n", floor(33000000.535 * 100 + 0.5));

The output for this code is:

floor(34000000.535 * 100 + 0.5) : 3400000053.000000
floor(33000000.535 * 100 + 0.5) : 3300000054.000000

Why does the first result not equal to 3400000054.0 as we could expect?

Rounding issues of course, use a library for arbitrary precision math (for example gmp) if you need exact calculations. — Ctx, Dec 30 '16 at 10:45
@cHao Note that even with _decimal_ [floating point](https://en.wikipedia.org/wiki/Decimal64_floating-point_format), issues like this can occur with values near the precision limits of the format. Such is the properties of floating point arithmetic, in any base. It is certainly more common with decimal text and binary `double`. — chux - Reinstate Monica, Dec 30 '16 at 12:08

chux - Reinstate Monica · Accepted Answer · 2016-12-31T14:09:42.460

double in C does not represent every possible number that can be expressed in text.

double can typically represent about 2⁶⁴ different numbers. Neither 34000000.535 nor 33000000.535 are in that set when double is encoded as a binary floating point number. Instead the closest representable number is used.

Text             34000000.535
closest double   34000000.534999996423...
Text             33000000.535
closest double   33000000.535000000149...

With double as a binary floating point number, multiplying by a non-power-of-2, like 100.0, can introduce additional rounding differences. Yet in these cases, it still results in products, one just above xxx.5 and another below.

Adding 0.5, a simple power of 2, does not incurring rounding issues as the value is not extreme compared to 3x00000053.5.

Seeing intermediate results to higher print precision well shows the typical step-by-step process.

#include <stdio.h>
#include <float.h>
#include <math.h>

 void fma_test(double a, double b, double c) {
   int n = DBL_DIG + 3;
   printf("a b c      %.*e %.*e %.*e\n", n, a, n, b, n, c);
   printf("a*b        %.*e\n", n, a*b);
   printf("a*b+c      %.*e\n", n, a*b+c);
   printf("a*b+c      %.*e\n", n, floor(a*b+c));
   puts("");
 }

int main(void) {
  fma_test(34000000.535, 100, 0.5);
  fma_test(33000000.535, 100, 0.5);
}

Output

a b c      3.400000053499999642e+07 1.000000000000000000e+02 5.000000000000000000e-01
a*b        3.400000053499999523e+09
a*b+c      3.400000053999999523e+09
a*b+c      3.400000053000000000e+09

a b c      3.300000053500000015e+07 1.000000000000000000e+02 5.000000000000000000e-01
a*b        3.300000053500000000e+09
a*b+c      3.300000054000000000e+09
a*b+c      3.300000054000000000e+09

The issue is more complex then this simple answers as various platforms can 1) use higher precision math like long double or 2) rarely, use a decimal floating point double. So code's results may vary.

chux, you made a mistake by saying that there are 2^64 different numbers represented. In reality, there are many less. Because there is the exponent part inside those 64, the same kind of representation as in the case of `float` that I analysed in my answer. A compiler can represent the number 100 as 10e2 or 100e1. The exponent part has no `property` about how a compiler should set it. See my answer that prints how a `double`/`long double`/`float` are represented. So when you write x=100 you cannot be sure neither what representation the compiler will give, nor the floating point hardware — alinsoar, Dec 31 '16 at 12:09
@alinsoar This answer did not say there are 2^64 different numbers. This answer, before your unnecessary edit, said "about 2^64 different numbers". In reality there are **not** many less values. A chunk of the 2^64 different bit patterns are Not-A-Numbers which only reduces the value pattern count less than 0.1%. Your exponent discussion of printing 100 as 10e2 or 100e1, still just one of the count of the 2^64 patterns, is irrelevant. Do not edit this answer - that edit and comment intent misleads the correct intent. — chux - Reinstate Monica, Dec 31 '16 at 14:25
you could have used "less than", as it is more correct than "about". — alinsoar, Dec 31 '16 at 16:39
@alinsoar The difference between "about 2^64 different numbers" and "less than 2^64 different numbers" is insignificant relative to OP's issue. The "about" has the advantage of indicating closeness, "less than" does not. Could has used "nearly 2^64 different numbers" to convey both thoughts, but again, it is not core to OP issue. — chux - Reinstate Monica, Dec 31 '16 at 22:52

score 0 · Answer 2 · edited May 23 '17 at 12:08

0

Question has been already answered here.

In basic float numbers are just approximation. If we have program like this:

float a = 0.2 + 0.3;
float b = 0.25 + 0.25;

if (a == b) {
    //might happen
}
if (a != b) {
    // also might happen
}

The only guaranteed thing is that a-b is relatively small.

edited May 23 '17 at 12:08

Community

1
1

answered Dec 30 '16 at 11:20

Peter Hrvola

9
2
3

score -1 · Answer 3 · edited May 23 '17 at 12:24

-1

Using the code that shows the representation of floats in memory as sum of terms, we get:

main()
{
    float x=floor(34000000.535 * 100 + 0.5);
    float y=floor(33000000.535 * 100 + 0.5);
    xx(&x);
    xx(&y);
    yy(x);
    yy(y);
}

This code will output the representation in memory of the values returned by floor in both cases.

Using the bc calcultor, we can see that the approximation is indeed good, but there are some perturbations due to math behind floor representation.

Note: I did set scale=20 in bc, which means, each intermediary computation keeps 20 digits after the point.

./a.out
1ST NUMBER=>    sign:0 exponent:1 0 0 1 1 1 1 fraction:0 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 1 1 1 0 0 0 1 0
2ND NUMBER=>    sign:0 exponent:1 0 0 1 1 1 1 fraction:0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1
1ST NUMBER=>    positive ( 1+1/(2) +1/(16) +1/(64) +1/(256) +1/(1024) +1/(8192) +1/(16384) +1/(32768) +1/(65536) +1/(131072) +1/(262144) +1/(4194304) )*2^31
2ND NUMBER=>    positive ( 1+1/(2) +1/(32) +1/(256) +1/(1024) +1/(2048) +1/(16384) +1/(8388608) )*2^31
@ bc
scale=20
( 1+1/(2) +1/(16) +1/(64) +1/(256) +1/(1024) +1/(8192) +1/(16384) +1/(32768) +1/(65536) +1/(131072) +1/(262144) +1/(4194304) )*2^31
3399999999.99999999999463129088
 ( 1+1/(2) +1/(32) +1/(256) +1/(1024) +1/(2048) +1/(16384) +1/(8388608) )*2^31
3299999999.99999999999731564544

edited May 23 '17 at 12:24

Community

1
1

answered Dec 30 '16 at 11:38

alinsoar

14,813
4
53
68

I think you are mixing something up here. In the original question there were two different numbers (34.... vs 33....), which is the root cause of the issue. You have the same number two times which of course gives the same result when applying `floor()` to it. The printf implementation doesn't come into account here. – Ctx Dec 30 '16 at 11:57
@Ctx thanks. I did not notice that the numbers were different. Corrected. – alinsoar Dec 30 '16 at 12:07
`void main` invokes undefined behavior. – Roland Illig Dec 31 '16 at 13:26
@RolandIllig here is about outputting representation of floats, what main returns is not significant in this problem. Also, not return type is gnu-extension, but this is not about returning an error code. I think you have difficulties to understand what the problem was. – alinsoar Dec 31 '16 at 13:57
Why set _scale=20_ rather than match [typical `float`](https://en.wikipedia.org/wiki/Single-precision_floating-point_format#IEEE_754_single-precision_binary_floating-point_format:_binary32) which has 24 bits of binary precision? With _scale=24_, `bc` reports `scale=24 ( 1+1/(2) +1/(16) +1/(64) +1/(256) +1/(1024) +1/(8192) +1/(16384) +1/(32768) +1/(65536) +1/(131072) +1/(262144) +1/(4194304) )*2^31 3400000000.000000000000000000000000` which matches the mathematical sum of those fractions. – chux - Reinstate Monica Dec 31 '16 at 14:53
@chux. You are right :) For 24-scale precision, bc prints a result much closer to printf. Why printf is not able to print the same as bc does ? Because both receive the same stream of data bits. – alinsoar Dec 31 '16 at 15:24
`printf()` is able to print the same as bc does. What value and what format are you using when `printf()` apparently does not work as desired? – chux - Reinstate Monica Dec 31 '16 at 15:27
Try `float f =( 1+1/(2.f) +1/(16.f) +1/(64.f) +1/(256.f) +1/(1024.f) +1/(8192.f) +1/(16384.f) +1/(32768.f) +1/(65536.f) +1/(131072.f) +1/(262144.f) +1/(4194304.f) ); for (int i=0; i<31; i++) f *= 2.0f; printf("%a %.20f\n", f,f);` --> `0x1.954fc4p+31 3400000000.00000000000000000000` – chux - Reinstate Monica Dec 31 '16 at 15:30
@chux this is about the internal representation of all floating numbers, not about how printf works, but about what data printf receives (as stream of bits representing a real number, etc). – alinsoar Dec 31 '16 at 16:38
@alinsoar Agree this is about the internal representation. I was replying about the issue of how `printf()` handles that data of your [comment](http://stackoverflow.com/questions/41394851/why-is-the-function-floor-giving-different-results-in-this-case/41395654?noredirect=1#comment70021999_41395654). `printf("%a", x);` shows very clearly what `pritnf()` received. – chux - Reinstate Monica Dec 31 '16 at 17:05

Why is the function floor giving different results in this case?

3 Answers3