0

From Java 8, the hashMap modified slightly to have balanced tree instead of linkedlist if more than 8 (TREEIFY_THRESHOLD=8) items on same bucket. is there any reason choosing 8?

would it impact the performance in case it is 9?

Muthu
  • 1
  • 1
  • A linked list is faster than a tree until *log(N) < N/2*, which happens at *N = 8*. – user207421 Mar 14 '19 at 05:26
  • Duplicate of [Why doesn't java's HashMap just use trees for collision chaining?](https://stackoverflow.com/questions/55030269/why-doesnt-javas-hashmap-just-use-trees-for-collision-chaining#comment96822932_55030269). – user207421 Mar 14 '19 at 05:30

1 Answers1

3

The use of a balanced tree instead of a linked-list is a tradeoff. In the case of a list, a linear scan must be performed to perform a lookup in a bucket, while the tree allows for log-time access. When the list is small, the lookup is fast and using a tree doesn't actually provide a benefit while around 8 or so elements the cost of a lookup in the list becomes significant enough that the tree provides a speed-up.

I suspect that the use of a tree is intended for the exceptional case where the key hash is catastrophically broken (e.g. many keys collide); while a linear lookup will cause performance to degrade severely the use of a tree mitigates this performance loss somewhat, if the keys are directly comparable.

Therefore, the exact threshold of 8 entries may not be terribly significant: the chance of a tree bin is 0.00000006 assuming good key distribution, so tree bins are obviously used very rarely in such a case. When the hash algorithm is failing catastrophically, then the number of keys in the bucket is far greater than 8 anyway.

This comes at a space penalty since the tree-node must include additional references: four references to tree nodes and a boolean in addition to the fields of a LinkedHashMap.Entry (see its source).

From the comments in the HashMap class source:

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).

nanofarad
  • 38,481
  • 4
  • 83
  • 110