1

For my computer science data structures class we had an assignment on hashing and java's HashMaps. Through this assignment I learned that collisions are handled as linked lists for the first 8 nodes, followed by a binary tree (or red black tree) for the rest. Why... Why aren't they just handled as tree for O(log n) efficiency?

The only writes ups that I could find were that when Java 8 released, it increased the efficiency of the chains by handling them this way instead of strictly linked list, which would be O(n). If anyone has any insight on this it would be greatly appreciated.

Thanks, Matt

crickon
  • 11
  • 2
  • 4
    Keeping a tree balanced requires O(log n) insertions and deletions, as opposed to a list's O(1), so it's not strictly always better. The red black tree is really only there to mitigate poorly implemented hash functions (or, I suppose, random bad luck with regards to collisions). Making every bucket a tree would make performance worse in the general case. – Michael Mar 06 '19 at 19:01
  • I don't think I explained my assignment properly. We were forcing collisions to see how HashMap handles them. My question is more of why the whole chain is not handled with a tree. Why are the first 8 nodes linked and treeify only starts when the chain becomes longer than that. – crickon Mar 07 '19 at 00:11
  • Yes. I already answered that. A tree is not always used because a tree is not always best. – Michael Mar 07 '19 at 00:16
  • Correct me if i'm wrong, wouldn't a poorly implemented tree still have the same performance as a LinkedList. – crickon Mar 07 '19 at 00:17
  • No. Like I said, a tree is slower to insert and remove. And anyway, big O does not tell the whole story. Something can be constant time and still be slow (consider `Thread.sleep(10000)`). – Michael Mar 07 '19 at 00:30
  • Also, HashMap is not implemented with a LinkedList, it uses arrays. – Michael Mar 07 '19 at 00:32
  • A tree is no improvement over a list until *log(N) < N/2*, which happens at *N = 8.* – user207421 Mar 07 '19 at 01:44
  • @user207421 would you care to explain that a little further. where does the n/2 come from? – crickon Mar 07 '19 at 02:25
  • Searching a list is going to find the item in *N/2* time on average, as the average item is halfway down the list. – user207421 Mar 07 '19 at 02:55
  • @user207421 Thank you, that's the answer I was looking for – crickon Mar 07 '19 at 16:46

1 Answers1

2

Clarification before I answer your question:

From your question it sounds like the first 8 nodes are always managed by a linkedlist structure and the ones after the 8th node are managed by a tree structure.

But this is how it actually works after it was changed in Java 8:

  • If the bucket contains a small number of nodes (< 8), it uses a linkedlist for all of this bucket's nodes.
  • If the bucket contains a large number of nodes (=> 8), it dynamically replaces it with an ad-hoc implementation of tree map, for all of this bucket's nodes.

Why does it work like this?

You have a more thorough explanaition here, but basically:

Well, previously entries with conflicting keys were simply appended to linked list, which later had to be traversed. Now HashMap promotes list into binary tree, using hash code as a branching variable. If two hashes are different but ended up in the same bucket, one is considered bigger and goes to the right. If hashes are equal (as in our case), HashMap hopes that the keys are Comparable, so that it can establish some order. This is not a requirement of HashMap keys, but apparently a good practice. If keys are not comparable, don't expect any performance improvements in case of heavy hash collisions.

In the case of high hash collisions, this implementation will improve worst-case performance from O(n) to O(log n).

For reference, this implementation change in HashMap was introdiced by JEP 180

Why aren't they always handled as tree?

They could have, I guess. I see 2 reasons, that combined could answer this:

  • There is a space penalty for using a tree, as explained in this answer.
  • Switching to a tree will be triggered only in rare cases where there are many collisions. A bucket shouldn't usually contain 8 or more nodes. If it does, this means that there has been a very high number of collisions (due to a bad key hashcode implementation or really bad luck). In most cases, where there are low collisions, a simply linked list will suffice since time complexity will tend to be constant 0(1).
Yair Kukielka
  • 9,919
  • 1
  • 35
  • 43