0

I just discover this behavior when using Dictionary from C#, after I remove a key from the dictionary, and then I would like to add using the same key, but the new added key is not at the last index of the dictionary?

Dictionary<string, byte> test = new Dictionary<string, byte>();

test.Add("c", 1);  // [{"c", 1}]
test.Add("b", 2);  // [{"c", 1}, {"b", 2}]
test.Add("a", 3);  // [{"c", 1}, {"b", 2}, {"a", 3}]
test.Remove("b");  // [{"c", 1}, {"a", 3}]

test.Add("b", 2);  // [{"c", 1}, {"b", 2}, {"a", 3}] <= why this happen?
                   // [{"c", 1}, {"a", 3}, {"b", 2}] and not this?

May I know why? and how can I make the newly added key to the last index of dictionary?

Tim
  • 3,585
  • 3
  • 34
  • 57

2 Answers2

2

Dictionary's are hash tables. If you look at the definition of a hash table, you'll notice that hash tables are unordered.

It's been some time since I looked at the specific details of the .NET dictionary implementation, so there might be some errors in the rest of my story -- but this is what I remember from the details:

There are a lot of different schemes for implementing hash tables, but the one that .NET uses works like the 'Open Addressing' algorithm with some variations. Basically new items are added at a list (at the end) and the hash table (a static array) adds pointers in this list. That's why it actually seems to preserve the order.

At some point the data will become filled with 'garbage', due to modifications or growth. At that point, the implementation will do a rehash. If I recall correctly, that is also the point at which it will check if there are too many collisions -- and if that's the case, it'll use a random prime to multiply all hash values with (thereby reducing the number of collisions). It's quite elegant really.

Since the open addressing scheme points to elements in a list, order in the list is not important. When you enumerate a dictionary, you basically look at this list.

You might wonder why it's not enumerating the array of hash codes instead. Well, hash tables are normally over-allocated, and data is stored in another list anyways. That simply means that this alternative would be far less efficient. If you would enumerate the hash table, you would also probably get a more consistent result - but because of the collisions still wouldn't get a completely consistent result. (e.g. if A and B are on the same hash code, the order of insertion would decide if A follows B or visa versa).

If you're looking for algorithms like 'set union' that require a consistent ordering, I suggest using containers like SortedDictionary instead.

atlaste
  • 29,128
  • 3
  • 54
  • 79
1

You can see the implementation code of the Dictionary class over here

As you can see, the implementation uses a technique that tracks a list of free positions in the entries array, and when a new value is added, the free entries are used first.

There is a non generic ListDictionary class in the framework, that I believe adds new items always at the end of the list. Keep in mind that access to that IDictionary implementation will typically be O(n) in average, in contrast to the O(1) in average to the generic Dictionary you are currently using.

Fede
  • 3,698
  • 1
  • 19
  • 28