How can I know in advance which real numbers would have an imprecise representation using float variables in C?

Question

I know that the number 159.95 cannot be precisely represented using float variables in C.

For example, considering the following piece of code:

#include <stdio.h>
int main()
{
    float x = 159.95;
    printf("%f\n",x);
    return 0;
}

It outputs 159.949997.

I would like to know if there is some way to know in advance which real value (in decimal system) would be represented in an imprecise way like the 159.95 number.

Best regards.

Basically all numbers that need more than `24` bits, ie... those that are not a multiple of `1/(2^24)` or `.000000059604644775390625` (might be `23` or `25`, I'm not sure; it is different for `float`, `double`, `long double`) ... and it changes with the range -- it's different for large and small numbers (`45234523674` or `0.000000000000000006756542`). Read [IEEE 754 Wikipedia article](https://en.wikipedia.org/wiki/IEEE_754) — pmg, Feb 14 '22 at 21:21
Also, in the absence of a **strong reason** otherwise, always use `double` for floating-point. — pmg, Feb 14 '22 at 21:26
@EugeneSh. Yes. I know it. But I'm just starting to learn how to program and digital stuff. It is not obvious for me, after reading the standard, how to determine when a real number (in decimal system) would be imprecisely represented. Please, let me know what I have missed from the standard. — Zaratruta, Feb 14 '22 at 21:27
@pmg They don't need to be "large". 1.000000059604644775390625 is already large enough to not be representable as a float. — nanofarad, Feb 14 '22 at 21:27
@Zaratruta Naively, you could create a list of *all* floats possible (32 bits - a bit over 4 million entries) and look up in there. For `double`s won't be practical though. — Eugene Sh., Feb 14 '22 at 21:29
A simple answer would be to assume they are all imprecise, and work accordingly, especially with comparison tests. And in the example 159.95 posted, if you need 9 significant digits then `float` is inadequate. — Weather Vane, Feb 14 '22 at 21:33
@WeatherVane. Yes. When I develop programs, I assume that. However, I would like to understand the phenomenon in a deeper way. — Zaratruta, Feb 14 '22 at 21:37
Put it this way: of the *infinity* of floating point values there are, only about 2^32 can be exactly represented. — Weather Vane, Feb 14 '22 at 21:39
use printf("%.2f\n", x); for float and printf("%.2lf\n", x); for double. You will get output 159.95 — Ihdina, Feb 14 '22 at 22:41
There are several more answers to this question — I suppose I should vote to close as a duplicate — at [How to determine w/o conversions that a given floating constant can be represented?](https://stackoverflow.com/questions/68835588) — Steve Summit, Feb 14 '22 at 22:42
Given that there an infinite number or _real values_ that cannot be represented using a _finite_ 32 bit representation, is there really any benefit in picking out the insignificant number of values that can be exactly represented? This is true of any representation - most _real_ values are also not precisely represented in decimal either - no matter how many digits you use. Real values are continuous; any number system representation with a finite number of digits is discrete. — Clifford, Feb 14 '22 at 22:45
@SteveSummit. In my point of view, it is not the same question. My question here is how can we determine in advance which number can be precisely represented. — Zaratruta, Feb 14 '22 at 23:07
@Clifford. The benefit is understanding. And everithing that can follow from this. — Zaratruta, Feb 14 '22 at 23:08
@Zaratruta Floating point numbers are fascinating, and rewarding to understand. Two other questions I've found particularly instructive are [bitwise splitting the mantissa of a IEEE 754 double?](https://stackoverflow.com/questions/69207580) and [how can smallest floating point number be 2^(-126) , not 2^(-128)](https://stackoverflow.com/questions/43153699). — Steve Summit, Feb 14 '22 at 23:13
Yes, that is my point. If you understand the maths, you realise the futility of this exercise. _real_ numbers and _decimal_ representation of numbers are not the same thing. You seem to be equating them. The coincidence of an exact decimal with a particular binary representation is not particularly useful. What is useful to understand is the number of _decimal significant figures_ are preserved without change in a round-trip conversion from decimal to binary and back. For `float` it is 6. — Clifford, Feb 15 '22 at 07:56
See also https://stackoverflow.com/questions/3310276/decimal-precision-of-floats — Clifford, Feb 15 '22 at 08:03
Yet more good discussion at [Why can't decimal numbers be represented exactly in binary?](https://stackoverflow.com/questions/1089018) — Steve Summit, Feb 25 '22 at 19:31

Eric Postpischil · Answer 1 · 2022-02-15T10:23:41.500

Succinctly, for the format most commonly used for float, a number is exactly representable if and only if it is representable as an integer F times a power of two, 2^E such that:

the magnitude of F is less than 2²⁴, and
–149 ≤ E < 105.

More generally, C 2018 5.2.4.2.2 specifies the characteristics of floating-point types. A floating-point number is represented as s•b^e•sum(f_k b^−k, 1 ≤ k ≤ p), where:

s is a sign, +1 or −1,
b is a fixed base chosen by the C implementation, often 2,
e is an exponent, which is an integer between a minimum e_min and a maximum e_max, chosen by the C implementation,
p is the precision, the number of base-b digits in the significand, and
f_k are digits in base-b, nonnegative integers less than b.

The significand is the fraction portion of the representation, sum(f_k b^−k, 1 ≤ k ≤ p). It is written as a sum so that we can express the variable number of digits it may have. (p is a variable set by the C implementation, not by the programmer using the C implementation.) When we write it out a significand in base b, it can be a numeral, such as .001110101001100101010110₂ for a 24-bit significand in base 2. Note that, in the this form (and the sum), the significand has all its digits after the radix point.

To make it slightly easier to tell if a number is in this format, we can adjust the scale so the significand is an integer instead of having digits after the radix point: s•b^e−p•sum(f_k b^p−k, 1 ≤ k ≤ p). This changes the above significand from .001110101001100101010110₂ to 001110101001100101010110₂. Since it has p digits, it is always a non-negative integer less than b^p.

Now we can figure out if a finite number is representable in this format:

Get b, p, e_min, and e_max for the target C implementation. If it uses IEEE-754 binary32 for float, then b is 2, p is 24, e_min is −125, and e_max is 128. When <float.h> is included, these are defined as FLT_RADIX, FLT_MANT_DIGITS, FLT_MIN_EXP, and FLT_MAX_EXP.
Ignore the sign. Write the absolute value of number as a rational number n/d in simplest form. If it is an integer, let d be 1.
If d is not a power of b, the number is not representable in the format.
If n is a multiple of b greater than or equal to b^p, divide it by b and multiply d by d until n is not a multiple or is less than b^p.
If n is greater than or equal to b^p, the number is not representable in the format.
Let e be such that 1/d = b^e−p. If e_min ≤ e ≤ e_max, the number is representable in the format. Otherwise, it is not.

Some floating-point formats might not support subnormal numbers, in which f₁ is zero. This is indicated by FLT_HAS_SUBNORM being defined to be zero and would require modifications to the above.

Excellent write-up. For IEEE-754 `binary32` I verified that the stated bounds for `E` are tight. — njuffa, Feb 14 '22 at 22:37
@njuffa: Thanks. There are three or more opportunities for off-by-one errors in those, so I was worried about them. (IEEE-754 exponent bounds are 1−127 = −126 and 254−127 = 127, C expresses them with the significand scaled down by 2, and then there is the scaling to make the significand an integer.) — Eric Postpischil, Feb 14 '22 at 22:49
@njuffa: Of course, I had the exponent sense of d negated in d = b^(e-p), since it is the denominator. I corrected that to 1/d = b^(e-p). Plenty of opportunities for errors, sigh. — Eric Postpischil, Feb 15 '22 at 10:24

nanofarad · Accepted Answer · 2022-02-14T21:31:47.277

Usually, a float is an IEEE754 binary32 float (this is not guaranteed by spec and may be different on some compilers/systems). This data type specifies a 24-bit significand; this means that if you write the number in binary, it should require no more than 24 bits excluding trailing zeros.

159.95's binary representation is 10011111.11110011001100110011... with repeating 0011 forever, so it requires an infinite number of bits to represent precisely with a binary format.

Other examples:

1073741760 has a binary representation of 111111111111111111111111000000. It has 30 bits in that representation, but only 24 significant bits (since the remainder are trailing zero bits). It has an exact float representation.

1073741761 has a binary representation of 111111111111111111111111000001. It has 30 significant bits and cannot be represented exactly as a float.

0.000000059604644775390625 has a binary representation of 0.000000000000000000000001. It has one significant bit and can be represented exactly.

0.750000059604644775390625 has a binary representation of 0.110000000000000000000001, which is 24 significant bits. It can be represented exactly as a float.

1.000000059604644775390625 has a binary representation of 1.000000000000000000000001, which is 25 significant bits. It cannot be represented exactly as a float.

Another factor (which applies to very large and very small numbers) is that the exponent is limited to the -126 to +127 range. With some handwaving around denormal values and other special cases, this generally allows values ranging from roughly 2^-126 to slightly under 2¹²⁸.

score 2 · Answer 3 · answered Feb 14 '22 at 22:01

I would like to know if there is some way to know in advance which real value (in decimal system) would be represented in an imprecise way like the 159.95 number.

In general, floating point numbers can only represent numbers whose denominator is a power of 2.

To check if a number can be represented as floating point value (of any floating-point type) at all, take the decimal digits after the decimal point, interpret them as number and check if they can be divided by 5^n while n is the number of digits:

159.95 => 95, 2 digits => 95%(5*5) = 20 => Cannot be represented as floating-point value

Counterexample:

159.625 => 625, 3 digits => 625%(5*5*5) = 0 => Can be represented as floating-point value

You also have to consider the fact that floating-point values only have a limited number of digits after the decimal point:

In principle, 123456789 can be represented by a floating-point value exactly (it is an integer), however float does not have enough bits!

To check if an integer value can be represented by float exactly, divide the number by 2 until the result is odd. If the result is < 2^24, the number can be represented by float exactly.

In the case of a rational number, first do the "divisible by 5^n" check described above. Then multiply the number by 2 until the result is an integer. Check if it is < 2^24.

Steve Summit · Answer 4 · 2022-02-14T22:31:45.337

I would like to know if there is some way to know in advance which real value... would be represented in an imprecise way

The short and only partly facetious answer is... all of them!

There are roughly 2^32 = 4294967296 values of type float. And there are an uncountably infinite number of real numbers. So, for a randomly-chosen real number, the chance that it can be exactly represented as a value of type float is 4294967296/∞, which is 0.

If you use type double, there are approximately 2^64 = 18446744073709551616 of those, so the chance that a randomly-chosen real number can be exactly represented as a double is 18446744073709551616/∞, which is again... 0.

I realize I'm not answering quite the question you asked, but in general, it's usually a bad idea to use binary floating-point types as if they were an exact representation of decimal fractions. Attempts to assume that they're ever an exact representation usually lead to trouble. In general, it's best to assume that floating-point types are an imperfect (approximate) realization of of real numbers, period (that is, without assuming decimal). If you never assume they're exact (which for true real numbers, they virtually never are), you'll never get into trouble in cases where you thought they'd be exact, but they weren't.

[Footnote 1: As Eric P. reminds in a comment, there's no such thing as a "randomly-chosen real number", which is why this is a partially facetious answer.]

[Footnote 2: I now see your comment where you say that you do assume they are all imprecise, but that you would "like to understand the phenomenon in a deeper way", in which case my answer does you no good, but hopefully some of the others do. I can especially commend Martin Rosenau's answer, which goes straight to the heart of the matter: a rational number is representable exactly in base 2 if and only if its reduced denominator is a pure power of 2, or stated another way, has only 2's in its prime factorization. That's why, if you take any number you can actually store in a float or double, and print it back out using %f and enough digits, with a properly-written printf, you'll notice that the numbers always end in things like ...625 or ...375. Binary fractions are like the English rulers still used in the U.S.: everything is halves and quarters and eights and sixteenths and thirty-seconds and sixty-fourths.]

Re “for a randomly-chosen real number”: No uniform distribution can be defined on the real numbers (nor even on the integers), so some other distribution must be specified. In infinitely many such distributions, the probability a chosen number is representable is non-zero. — Eric Postpischil, Feb 14 '22 at 22:12
@EricPostpischil I meant to concede that this was a partially-facetious answer. Thanks for the reminder (and the reasoning). — Steve Summit, Feb 14 '22 at 22:21
There are uncountably infinite different real numbers, but there are only a countable infinite subset of them that we can represent thru an algorithm. Most of them, we'll never ever be able to represent in a computer in whatever format. We just can't compute them! Of course, if we're able to give a decimal representation of the real, it falls in the countable set. — aka.nice, Feb 16 '22 at 11:03

How can I know in advance which real numbers would have an imprecise representation using float variables in C?

4 Answers4