Scott Chamberlain's answer covers nicely the scenario of frequent readers and infrequent writers, by using a ReaderWriterLockSlim. But what if writing is as frequent as reading? The ReaderWriterLockSlim won't help much in this case.
My idea for mitigating this scenario is to move out of the protected section the calculation of the hashcode, and protect only the operations that involve shared state. This should be beneficial in case the cost of calculating the hashcode of the values is not trivial, for example when the values are long strings. Below is an implementation of this idea, for building a bounded concurrent HashSet<T>. This collection uses a HashSet<(T, int)> as underlying storage, with the int property being the precalculated hashcode of the T value:
public class BoundedConcurrentHashSet<T>
{
private readonly HashSet<(T Value, int HashCode)> _set;
private readonly int _boundedCapacity;
private readonly IEqualityComparer<T> _comparer;
public BoundedConcurrentHashSet(int boundedCapacity,
IEqualityComparer<T> comparer = default)
{
_boundedCapacity = boundedCapacity;
_comparer = comparer ?? EqualityComparer<T>.Default;
_set = new(new _Comparer(_comparer));
}
// A comparer that returns the precalculated HashCode
private class _Comparer : IEqualityComparer<(T, int)>
{
private readonly IEqualityComparer<T> _comparer;
public _Comparer(IEqualityComparer<T> comparer) { _comparer = comparer; }
public bool Equals((T, int) x, (T, int) y) => _comparer.Equals(
x.Item1, y.Item1);
public int GetHashCode((T, int) obj) => obj.Item2;
}
public int Count { get { lock (_set) return _set.Count; } }
public bool IsEmpty => Count == 0;
public int BoundedCapacity => _boundedCapacity;
public bool Contains(T value)
{
int hashCode = _comparer.GetHashCode(value);
lock (_set) return _set.Contains((value, hashCode));
}
public bool TryGetValue(T equalValue, out T actualValue)
{
int hashCode = _comparer.GetHashCode(equalValue);
lock (_set)
{
if (_set.TryGetValue((equalValue, hashCode), out var existing))
{
actualValue = existing.Value; return true;
}
actualValue = default; return false;
}
}
public bool TryAdd(T value)
{
int hashCode = _comparer.GetHashCode(value);
lock (_set)
{
if (_set.Count < _boundedCapacity) return _set.Add((value, hashCode));
return false;
}
}
public bool TryGetOrAdd(T equalValue, out T actualValue)
{
int hashCode = _comparer.GetHashCode(equalValue);
lock (_set)
{
if (_set.TryGetValue((equalValue, hashCode), out var existing))
{
actualValue = existing.Value; return true;
}
if (_set.Count < _boundedCapacity)
{
bool added = _set.Add((equalValue, hashCode));
Debug.Assert(added);
actualValue = equalValue; return true;
}
actualValue = default; return false;
}
}
public bool TryRemove(T value)
{
int hashCode = _comparer.GetHashCode(value);
lock (_set) return _set.Remove((value, hashCode));
}
public bool TryRemove(T equalValue, out T actualValue)
{
int hashCode = _comparer.GetHashCode(equalValue);
lock (_set)
{
if (_set.TryGetValue((equalValue, hashCode), out var existing))
{
bool removed = _set.Remove((equalValue, hashCode));
Debug.Assert(removed);
actualValue = existing.Value; return true;
}
actualValue = default; return false;
}
}
public T[] ToArray()
{
lock (_set) return _set.Select(e => e.Value).ToArray();
}
}
The public members of this collection are:
- Properties:
Count, IsEmpty and BoundedCapacity.
- Methods:
Contains, TryGetValue, TryAdd, TryGetOrAdd, TryRemove and ToArray.
A downside of not using internally a HashSet<T> with the same T, is that in case the T is string there is no protection against HashDoS attacks. So this collection should not be used for storing strings of potentially hostile origin.
Below are some benchmarks involving four different implementations, and with different lengths for the string values.
- Lock-Optimized: is the implementation above.
- Lock-Simple: is a simple
HashSet<T>+lock implementation that doesn't precalculate the hashcodes.
- ReaderWriterLockSlim: is Scott Chamberlain's implementation.
- Native: is an implementation that wraps a
ConcurrentDictionary<T, object>. It's not bounded, so it's not a fair contender. It is included here just for reference.
The scenario is the same for all tests: 4 worker threads that randomly and concurrently read (50%) or add (25%) or remove (25%) values from the same hashset. The reported metric is the total number of operations by all workers per second.
| String length |
Lock-Optimized |
Lock-Simple |
ReaderWriterLockSlim |
Native (not bounded) |
| 10 |
3,833,272 |
4,564,383 |
1,978,146 |
10,369,830 |
| 30 |
4,196,021 |
4,419,448 |
2,023,593 |
10,697,999 |
| 100 |
4,024,539 |
3,417,730 |
1,913,043 |
8,365,800 |
| 300 |
3,952,180 |
2,128,451 |
1,551,082 |
4,644,652 |
| 1000 |
1,994,425 |
1,018,591 |
839,897 |
2,110,027 |
The Lock-Simple outperforms the Lock-Optimized for short strings. The Lock-Optimized starts to shine for strings with length 100 and more.