From Java 8, the hashMap modified slightly to have balanced tree instead of linkedlist if more than 8 (TREEIFY_THRESHOLD=8) items on same bucket. is there any reason choosing 8?
would it impact the performance in case it is 9?
From Java 8, the hashMap modified slightly to have balanced tree instead of linkedlist if more than 8 (TREEIFY_THRESHOLD=8) items on same bucket. is there any reason choosing 8?
would it impact the performance in case it is 9?
The use of a balanced tree instead of a linked-list is a tradeoff. In the case of a list, a linear scan must be performed to perform a lookup in a bucket, while the tree allows for log-time access. When the list is small, the lookup is fast and using a tree doesn't actually provide a benefit while around 8 or so elements the cost of a lookup in the list becomes significant enough that the tree provides a speed-up.
I suspect that the use of a tree is intended for the exceptional case where the key hash is catastrophically broken (e.g. many keys collide); while a linear lookup will cause performance to degrade severely the use of a tree mitigates this performance loss somewhat, if the keys are directly comparable.
Therefore, the exact threshold of 8 entries may not be terribly significant: the chance of a tree bin is 0.00000006 assuming good key distribution, so tree bins are obviously used very rarely in such a case. When the hash algorithm is failing catastrophically, then the number of keys in the bucket is far greater than 8 anyway.
This comes at a space penalty since the tree-node must include additional references: four references to tree nodes and a boolean in addition to the fields of a LinkedHashMap.Entry (see its source).
From the comments in the HashMap class source:
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).