What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision ?
8 Answers
The biggest/largest integer that can be stored in a double without losing precision is the same as the largest possible value of a double. That is, DBL_MAX or approximately 1.8 × 10308 (if your double is an IEEE 754 64-bit double). It's an integer. It's represented exactly. What more do you want?
Go on, ask me what the largest integer is, such that it and all smaller integers can be stored in IEEE 64-bit doubles without losing precision. An IEEE 64-bit double has 52 bits of mantissa, so I think it's 253:
- 253 + 1 cannot be stored, because the 1 at the start and the 1 at the end have too many zeros in between.
- Anything less than 253 can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one.
- 253 obviously can be stored, since it's a small power of 2.
Or another way of looking at it: once the bias has been taken off the exponent, and ignoring the sign bit as irrelevant to the question, the value stored by a double is a power of 2, plus a 52-bit integer multiplied by 2exponent − 52. So with exponent 52 you can store all values from 252 through to 253 − 1. Then with exponent 53, the next number you can store after 253 is 253 + 1 × 253 − 52. So loss of precision first occurs with 253 + 1.
- 265,622
- 35
- 449
- 690
-
160+1 Good job noticing that the question did not really mean what the asker probably intended and providing both answers ("technically correct" and "probably expected"). – Pascal Cuoq Dec 04 '09 at 18:32
-
77Or "messing about" and "trying to help" as I tend to call them :-) – Steve Jessop Dec 04 '09 at 18:34
-
I have juste added more precision in the description of my question. I was talking about the biggest "no-floating" integer. – Franck Freiburger Dec 04 '09 at 18:37
-
I bow to your superior analysis of the question. +1, nice one! – Carl Smotricz Dec 04 '09 at 18:39
-
@Soubok, you have to define what "no-floating" is first, because it sure isn't a term with a standard meaning! – Pavel Minaev Dec 04 '09 at 18:46
-
@Carl: *Everybody* bows to Steve "Who the hell is Jon Skeet?" Jessop :) – Dan Moulding Dec 04 '09 at 18:47
-
8I bow to Tony the Pony, and no other. – Steve Jessop Dec 04 '09 at 18:51
-
@Pavel Minaev: Yes I know, but if I ask such a question, unfortunately this is because I don't know all standards :) – Franck Freiburger Dec 04 '09 at 19:46
-
11You don't mean "all smaller integers", you mean all integers of equal or lesser magnitude. Because there are a lot of negative integers below below 2^53 and cannot be represented exactly in a double. – Southern Hospitality Dec 05 '09 at 08:54
-
14I do mean smaller, and that's exactly what I mean when I say smaller :-) -1,000,000 is less than 1, but it is not smaller. – Steve Jessop Dec 05 '09 at 13:23
-
@SteveJessop, can you explain the first sentence? why is the biggest integer that can be stored in a double without losing precision is the same as the largest possible value of a double? – Pacerier Sep 21 '13 at 19:14
-
2@Pacerier: It's an integer, and its representation as `double` is exact, and it's the largest integer with that property. Hence it answers the title of this question, "biggest integer that can be stored in a double". I don't think I can explain any further, there's only so much mileage in explaining a joke. – Steve Jessop Sep 23 '13 at 18:35
-
1@SteveJessop "_Anything less than 2^53 can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one_" I couldn't understand this correctly; are you talking about the implicit/hidden bit because I cannot imagine how the exponent gives the 53rd bit. Please clarify. – legends2k Jun 19 '14 at 14:34
-
@legends2k: The exponent tells you the position of the implicit/hidden bit. That is, the exponent in effect gives you the additional bit of precision. – Steve Jessop Jun 19 '14 at 15:05
-
@SteveJessop But isn't the implicit bit always at the beginning? For instance, _1.XXX_ is the significand for [minifloats](http://en.wikipedia.org/wiki/Minifloat) which have 1 sign, 4 exponent and 3 significand bits. – legends2k Jun 19 '14 at 15:13
-
@legends2k: the implicit bit doesn't actually exist in the object representation of the float, that's why it's called "implicit". So yes it's "at the beginning" of the value. What in the object representation tells you where the beginning of the value is? The exponent. – Steve Jessop Jun 19 '14 at 15:33
-
3Extra bonus for being a smartass. – Mad Physicist Jul 31 '15 at 14:10
-
1You can encode everything smaller than 2^53 because the *exponent value* can go past 2^53, floating the point all the way to the right of the mantissa. Saying that the exponent gives an "extra bit" is confusing. If we had just 5 bits of exponent for example you wouldn't be able to encode integers between 2^33 and 2^53 even with the 52 bits of mantissa + 1 implicit. – Joan Charmant Aug 20 '16 at 14:16
-
What's the related number on the negative side? `-(2^53)`? – Levi Morrison Nov 30 '17 at 20:25
-
@LeviMorrison: yes, by the same arguments but with the sign bit set. – Steve Jessop Jan 10 '18 at 00:24
-
What is the exact value of 2^53? Google calculator just says `9.0071993e+15` but I need the exact value. – Aaron Franke Jan 16 '20 at 01:41
-
@Aaron: um, the exact value is 9007199254740992. As a 64-bit float, this is the number with sign 0, significand bits all zero, and exponent +53. The bit pattern is `0x4340000000000000`. A converter like http://weitz.de/ieee/ might help you see why (others are available, that's just the first one I found). – Steve Jessop May 25 '20 at 22:05
-
@SteveJessop: I want to convert a `long long` (signed 64-bit integer) to a `double`. I want to show an error message if the number in the `long long` is too large or too small to be represented by a `double`. What boundaries should I check for? In your answer you mention 2^53 as the upper limit but what about negative numbers? What is the smallest integer I can store in a `double`? – Andreas Jul 15 '20 at 19:05
-
1@Andreas: IEEE floats have a separate sign bit, so they're completely symmetrical about 0. `-2^53` is just `2^53` with the sign bit flipped. `-2^53 - 1` is not representable for the same reason that `2^53 + 1` is not representable. – Steve Jessop Jul 15 '20 at 19:30
-
Having said that, following the smartass theme of my answer to the original question: the smallest integer that `double` can represent is 0. -1 squllion is less than 0, but it isn't smaller, it's a "large negative number" ;-) – Steve Jessop Jul 15 '20 at 19:33
-
In any case you could convince yourself by actually doing the computation: `(double)-9007199254740992LL == ((double)-9007199254740992LL) - 1` is true. `(double)-9007199254740992LL == ((double)-9007199254740992LL) + 1` is false. So, `-2^53` is the point where `double` loses precision. – Steve Jessop Jul 15 '20 at 19:38
-
@SteveJessop: But when I do this: `long long x = 9007199254740992; double y = (double) x; printf("%.14g\n", y);` it prints `9.007199254741e+015` which is `9007199254741000` so 8 more than my initial `long long`. Why is that? – Andreas Jul 15 '20 at 20:26
-
@Andreas: you asked for 14 digits in the format code `%.14g`. You only got 13, but that's because to 14 places it's 9.007199254741**0**e+015, and the trailing 0 isn't printed. Try `%.16g`. – Steve Jessop Jul 15 '20 at 20:30
-
1It's interesting that nobody has mentioned this question's connection with Javascript. Javascript doesn't have an integer type, everything is a float (double) so this answer gives you the range of integers in Javascript. – Mark Ransom Mar 28 '21 at 03:06
9007199254740992 (that's 9,007,199,254,740,992 or 2^53) with no guarantees :)
Program
#include <math.h>
#include <stdio.h>
int main(void) {
double dbl = 0; /* I started with 9007199254000000, a little less than 2^53 */
while (dbl + 1 != dbl) dbl++;
printf("%.0f\n", dbl - 1);
printf("%.0f\n", dbl);
printf("%.0f\n", dbl + 1);
return 0;
}
Result
9007199254740991 9007199254740992 9007199254740992
-
8Assuming it will be 'close' but less than a 2^N, then a faster test is `double dbl = 1; while (dbl + 1 != dbl) dbl *= 2; while (dbl == --dbl);` which yields the same result – Seph Mar 06 '12 at 10:21
-
4@Seph what the...? No? `while (dbl == --dbl)` will loop forever or not at all. :) (in this case, not at all, since it is a 2^N). You'll have to approach it from below. It will indeed also result in one less than the expected result (since the one check in the while loop decrements dbl). And it depends on order of execution, if the decrement is done before or after evaluating the left side (which is undefined as far as I know). If it's the former, it'll always be true and loop forever. – falstro Oct 25 '16 at 14:53
-
14
-
2
-
A weakness to using `while (dbl + 1 != dbl) dbl++;` in that `dbl + 1 != dbl` may evaluate using `long double` math - consider `FLT_EVAL_METHOD == 2`. This could end in an infinite loop. – chux - Reinstate Monica Sep 25 '18 at 19:27
-
FWIW, the value you quote minus one (in terms of the IEEE 754 double precision representation) has an exponent of 1075 - 1023 = 52 and a mantissa with all (52) one bits after the decimal point. The next value (900...2) then has all (52) zeros after the decimal point in the mantissa and an exponent of 1076 - 1023 = 53. – Andre Holzner Mar 12 '19 at 16:50
-
When compiling this example on 32bits the condition `dbl + 1 != dbl` is indefinitely true. This is because in 64bits the `gcc` uses `sse` by default, this is not the case in 32bits. The example behave similarly in 32bits when passing `-mfpmath=sse` to `gcc`. – Daniel Da Cunha Sep 10 '19 at 10:15
-
-
The largest integer that can be represented in IEEE 754 double (64-bit) is the same as the largest value that the type can represent, since that value is itself an integer.
This is represented as 0x7FEFFFFFFFFFFFFF, which is made up of:
- The sign bit 0 (positive) rather than 1 (negative)
- The maximum exponent
0x7FE(2046 which represents 1023 after the bias is subtracted) rather than0x7FF(2047 which indicates aNaNor infinity). - The maximum mantissa
0xFFFFFFFFFFFFFwhich is 52 bits all 1.
In binary, the value is the implicit 1 followed by another 52 ones from the mantissa, then 971 zeros (1023 - 52 = 971) from the exponent.
The exact decimal value is:
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
This is approximately 1.8 x 10308.
- 457
- 4
- 5
-
1What about the largest value that it can represent with all values between it and zero contiguously representable? – Aaron Franke Jan 16 '20 at 01:41
-
@AaronFranke The question didn't ask about contiguous representation, but the answer to that different question has been included in most other answers here, or even wrongly given as the actual answer. It's 2⁵³ (2 to the power of 53). – Simon Biber Apr 29 '20 at 12:57
Wikipedia has this to say in the same context with a link to IEEE 754:
On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit.
2^53 is just over 9 * 10^15.
- 64,576
- 17
- 123
- 163
-
@Steve Jessop more or less, that is indeed what I am saying. I have also encountered hardware systems that don't have a FPU that still need to be IEEE-compliant, so that "typical system" stuff doesn't really help me if I come back to here 8 months later and need the same info for my 68K-based microcontroller (assuming it doesn't have a FPU... I can't remember). – San Jacinto Dec 04 '09 at 18:39
-
16@San Jacinto - "This is useless" is unduly harsh. The answer is quite useful, just not as useful as it would have been if it included the comment that typical computer systems do indeed use the IEEE 754 reprensentation. – Stephen C. Steel Dec 04 '09 at 18:47
-
@Stephen C. Steel, actually you are correct. Under my scenario, coming back to this at a later time and looking for the IEEE max, it is impossibly ambiguous as to what a 'typical system' is, but there is still merit in the answer besides this complaint. – San Jacinto Dec 04 '09 at 18:50
You need to look at the size of the mantissa. An IEEE 754 64 bit floating point number (which has 52 bits, plus 1 implied) can exactly represent integers with an absolute value of less than or equal to 2^53.
- 4,576
- 1
- 29
- 25
1.7976931348623157 × 10^308
http://en.wikipedia.org/wiki/Double_precision_floating-point_format
- 476
- 4
- 13
-
2
-
2@Carl well, if the integer has zeros beyond to the left, then it is precisely stored. – Wilhelm Dec 04 '09 at 18:27
-
4@all you downvoters: 1.7976931348623157 × 10^308 **is** an exact integer. Do you all need to attend remedial math classes or something?? – Dan Moulding Dec 04 '09 at 18:43
-
7We're down to semantics here in the discussion of this hopelessly sunk answer. True, that number can be represented exactly and thereby fulfills the letter of the question. But we all know it's a tiny island of exactitude in an ocean of near misses, and most of us correctly interpolated the question to mean "the largest number beyond which precision goes down the drain." Ah, isn't it wonderful that CompSci is an exact science? :) – Carl Smotricz Dec 04 '09 at 18:59
-
This is the correct answer to what was asked. The value DBL_MAX, which is the largest value which the IEEE double can represent IS an exact integer: in binary representation it is 53 ones followed by 971 zeros (and of course its exact expression in decimal notation is 308 digits long). However, the next smaller exact integer in the IEEE representation is 52 ones followed by 972 zeroes (i.e. a gap of 2^971). What the OP probably wanted was the upper limit of integer values that can be represented without gaps, which is 2^53 (as noted in other answers). – Stephen C. Steel Dec 04 '09 at 19:05
-
@Carl: But, given the question that was asked, is this answer *wrong*? No, it certainly isn't. And nobody really *knows* what the OP *meant* except, the OP. So why downvote an otherwise correct answer? Because you feel Jamie's talents of mind-reading aren't up to par? – Dan Moulding Dec 04 '09 at 19:16
-
Is the number given above the exact value of DBL_MAX? I'm not certain it isn't, but the Wikipedia article linked to indicates that it is approximate, and it certainly would be some coincidence for a number determined by the limits of a base-2 representation, to be divisible by quite that many powers of 10. Not that I downvoted this, just in my answer I indicated that my approximation was indeed an approximation. – Steve Jessop Dec 04 '09 at 19:58
-
@StephenC.Steel I know I'm 5yrs late to the party, but Double.MAX_VALUE is 1.797... x 10^308 = (((1 << 54)-1) * (1 << 970)), which is 53 ones followed by only 969 zeroes. The highest the exponent can get while still representing a number is 1022, minus the 52 explicit and the one implicit 1 leaves us with 969. – masterxilo Jun 02 '14 at 18:03
-
3@DanMoulding 1.7976931348623157 × 10^308 is an exact integer, but I am pretty sure this particular integer cannot be stored exactly in a double. – Pascal Cuoq Sep 26 '14 at 22:15
-
1Note (just in case someone links to this specific answer): the actual number is provided in Simon Biber's answer. – Alexey Romanov Jul 06 '18 at 14:08
-
This answer might be true, but pretty misleading, as chances are high that whoever is interested in the answer, has the question in mind: "what is the _range_ of integers between 0 and Max I can safely store, without losing information". And 10^308 distinct values certainly couldn't be represented on 64 bits. – vmatyi Dec 17 '21 at 15:38
-
@DanMoulding Most of us can tell what the OP meant, partly because of some of the wording, but mainly because the most simple (a.k.a., "literal") interpretation of the question would have much more limited use. If you were asking for the number given in this answer, you would probably be aware how strange the question is, and take care to emphasize that it is actually what you want. Furthermore, the answer should, at the very least, point out that it is answering the literal and likely unintended question, if not actually providing both answers to help all the people who end up here. – cesoid Mar 03 '22 at 20:59
It is true that, for 64-bit IEEE754 double, all integers up to 9007199254740992 == 2^53 can be exactly represented.
However, it is also worth mentioning that all representable numbers beyond 4503599627370496 == 2^52 are integers. Beyond 2^52 it becomes meaningless to test whether or not they are integers, because they are all implicitly rounded to a nearby representable value.
In the range 2^51 to 2^52, the only non-integer values are the midpoints ending with ".5", meaning any integer test after a calculation must be expected to yield at least 50% false answers.
Below 2^51 we also have ".25" and ".75", so comparing a number with its rounded counterpart in order to determine if it may be integer or not starts making some sense.
TLDR: If you want to test whether a calculated result may be integer, avoid numbers larger than 2251799813685248 == 2^51
- 128
- 6
As others has noted, I will assume that the OP asked for the largest floating-point value such that all whole numbers less than itself is precisely representable.
You can use FLT_MANT_DIG and DBL_MANT_DIG defined in float.h to not rely on the explicit values (e.g., 53):
#include <stdio.h>
#include <float.h>
int main(void)
{
printf("%d, %.1f\n", FLT_MANT_DIG, (float)(1L << FLT_MANT_DIG));
printf("%d, %.1lf\n", DBL_MANT_DIG, (double)(1L << DBL_MANT_DIG));
}
outputs:
24, 16777216.0
53, 9007199254740992.0
- 1,306
- 1
- 11
- 23