14

The problem of using doubles as keys in maps/sets is floating point precision.

Some people have suggested adding an epsilon in your compare function, but that means your keys will no longer fulfil the necessary strict weak ordering criterion. This means that you will get a different set/map depending on the order of inserting your elements.

In the case where you want to aggregate/combine/merge data based on double values, and are willing to allow a certain level of rounding/epsilon (clearly, you'll have to), is the following solution a good idea?

Convert all the doubles (where we intended as keys) into integers by multiplying them by the precision factor (e.g. 1e8) and rounding to the nearest integer (int)i+0.5(if i>0), then create a set/map that keys off these integers. When extracting the final values of the keys, divide the ints by the precision factor to get the double value back (albeit rounded).

πάντα ῥεῖ
  • 85,314
  • 13
  • 111
  • 183
zhaomin
  • 381
  • 3
  • 13
  • Maybe just use a arbitrary precision library (such as GMP) data type as the key type? – PaulMcKenzie Jun 29 '15 at 17:08
  • Why do you consider floating-point precision a problem when using doubles as a key in a tree or hash table structure? It works as long as you don't use NaNs as keys, and it even works exactly as you'd expect. – tmyklebu Jun 29 '15 at 18:00
  • I wonder what kind of problem could possibly require floating-point values in sets or maps. Sets and maps are a natural fit for discrete data, floating-points values are good for continuous data. – Christian Hackl Jun 29 '15 at 18:54
  • @ChristianHackl: Floating-point numbers *are* discrete data. – tmyklebu Jun 29 '15 at 19:37
  • @tmyklebu: Well, then how many numbers are there between 1.0 and 2.0 according to the C++ standard? – Christian Hackl Jun 29 '15 at 21:23
  • @ChristianHackl: Finitely many, and that's a stupid analogy to begin with. How many `int`s are there larger than 42 according to the C++ standard? – tmyklebu Jun 30 '15 at 00:30
  • @PaulMcKenzie thanks for the suggestion. wouldn't you still run into rounding errors in the mantissa though? – zhaomin Jun 30 '15 at 04:24
  • @tmyklebu the problem with floating point precision is that i can store a value under the key 0.1 for example, but when i look up with a subsequent 0.1 (after going through some arithmetic operations, some precision is lost), it won't be able to find the same value in the map. – zhaomin Jun 30 '15 at 04:28
  • @ChristianHackl let's say i want to count the number of people with a certain score (floating number e.g 99.4) in an examination, or aggregating bids in an auction. – zhaomin Jun 30 '15 at 04:35
  • @tmyklebu: std::numeric_limits::max() - 42 – Christian Hackl Jun 30 '15 at 04:51
  • @zhaomin: Both of those examples should not use floating-point numbers. Exam grades are composed from discrete (integer) numbers, and the other is currency. Floating point values may be handy for the visualisation of such data, but not for its storage. – Christian Hackl Jun 30 '15 at 04:58
  • @ChristianHackl what currency type would you recommend? i suspect you can't get away from using floating point numbers with round at some point, which is why my recommendation is to convert the floats into a discrete integer (after scaling up and rounding). thoughts? – zhaomin Jun 30 '15 at 07:18
  • 1
    @zhaomin: Don't use floating-point numbers as keys in a dictionary if you're fuzzy on what they are. Also, don't use them as keys in a dictionary if you want unequal numbers to compare equal. There is nothing wrong with using doubles as keys. ("Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?") – tmyklebu Jun 30 '15 at 12:37
  • @tmyklebu: while floating point numbers are in fact discrete, they are generally used to describe continuous processes or data. – Rudy Velthuis Jun 30 '15 at 13:33
  • @zhaomin: why don't you use a rather simplified integer value, like your suggestion derived from the double to serve as key, but store the original double and its associated value both as a different value (e.g. in a struct). IOW, the key for the dictionary is the integer derived rom the double, and the value for that key contains both the original double and its associated value. hat way the key allows for quick searches and the stored original double still exists (unmodified) for further calculations. – Rudy Velthuis Jun 30 '15 at 13:36

2 Answers2

3

"Convert all the doubles (where we intended as keys) into integers by multiplying them by the precision factor (e.g. 1e8) and rounding to the nearest integer (int)i+0.5(if i>0), then create a set/map that keys off these integers. When extracting the final values of the keys, divide the ints by the precision factor to get the double value back (albeit rounded)."

I would recommend using integer type keys (e.g. long long) for the map in first place, and trim them for double representation using a fixed precision for division.

But that depends, if you are able to apply fix point math for your actual use case. If you need to cover a wide range of value precisions (like e.g. +-1e-7 - +-1e7), such approach won't work.

Community
  • 1
  • 1
πάντα ῥεῖ
  • 85,314
  • 13
  • 111
  • 183
  • thanks for the suggestion but how exactly it is different from my solution (apologies that i didn't understand it). you're suggesting to use integer keys (scaled and rounded like i suggested?) in the map, and divided by the same scale to get the double values back? – zhaomin Jun 30 '15 at 04:52
  • @zhaomin There's not much difference to what you were describing, I just wanted to encourage you going that direction. Also note that fixed floating point values can easily be encapsulated in to a class. There's even a standard implementation for doing such: [`std::ratio`](http://en.cppreference.com/w/cpp/numeric/ratio) – πάντα ῥεῖ Jun 30 '15 at 16:36
  • thanks. unfortunately i'm looking at getting these numbers during runtime so std::ratio will not work. but thanks for the suggestion – zhaomin Jul 02 '15 at 06:13
2

Convert all the doubles (where we intended as keys) into integers by multiplying them by the precision factor (e.g. 1e8) and rounding to the nearest integer (int)i+0.5(if i>0), then create a set/map that keys off these integers. When extracting the final values of the keys, divide the ints by the precision factor to get the double value back (albeit rounded).

Instead of dividing by the precision factor to get the doubles back, simply store the double together with the associated value in a struct, and put that struct in the dictionary as the "value" for that integer key. That way, the original double value is still around and can be used for calculations. Just not for the key search.

If, however, you can live with slightly rounded values (due to the fact you simply divide an integer by an epsilon), your suggested approach is already good enough.

As the other answer says, it very much depends on the range of the values. If some are extremely huge and others are extremely small, then your approach to get integer keys won't work. If they are only a few digits apart, then it might.

Rudy Velthuis
  • 27,909
  • 4
  • 45
  • 87
  • thanks for the suggestion. i'd actually prefer to use the rounded value because that better represents the set of doubles that get rounded to it, rather than choosing the first double that was put into the set. – zhaomin Jul 02 '15 at 06:16