Using
from collections import defaultdict
Assume the following data structure:
amazing_dict = defaultdict(lambda: defaultdict(amazing_dict.default_factory))
which can also be written as:
generate_amazing_dict = lambda: defaultdict(generate_amazing_dict)
amazing_dict = generate_amazing_dict()
It is required to achieve thread safety to amazing_dict when creating new keys, meaning that no two threads are allowed to generate the default value for the same missing key.
For example, running both amazing_dict["a"]["x"] = 0 and amazing_dict["a"]["y"] = 0 on two different threads must always result in amazing_dict = {"a": {"x": 0, "y": 0}} and never will one thread override amazing_dict["a"].
According to this great answer, defaultdict on its own is thread-safe, however, once the default_factory is using a python code (such as in amazing_dict) there is a potential to thread switch to occur before the factory is being called.
I want to make sure the call to the factory method is always done under lock. I have come up with two possible implementation that might provide thread safety on the factory method.
Option A - override the __missing__ method of defaultdict to be done under lock and check again if value exists in self before calling factory method.
class threadsafe_defaultdict(defaultdict):
def __init__(self, default_factory=None, **kwargs) -> None:
super().__init__(default_factory, kwargs)
self._missing_lock = threading.Lock()
def __missing__(self, key):
with self._missing_lock:
if key in self:
return self[key]
return super().__missing__(key)
vs Option B - override the __getitem__ method of dict to check if the item exists, if it already exists call super method normally, else call super method under lock.
class threadsafe_defaultdict(defaultdict):
def __init__(self, default_factory=None, **kwargs) -> None:
super().__init__(default_factory, kwargs)
self._missing_lock = threading.Lock()
def __getitem__(self, key):
if key in self:
return super().__getitem__(key)
with self._missing_lock:
return super().__getitem__(key)
In both cases amazing_dict would be generated as follows:
generate_amazing_dict = lambda: threadsafe_defaultdict(generate_amazing_dict)
amazing_dict = generate_amazing_dict()
Which of the two implementations is more correct (if any)? Also, further suggestions on achieving thread safety for this case are welcome