0

Using

from collections import defaultdict

Assume the following data structure:

amazing_dict = defaultdict(lambda: defaultdict(amazing_dict.default_factory))

which can also be written as:

generate_amazing_dict = lambda: defaultdict(generate_amazing_dict)
amazing_dict = generate_amazing_dict()

It is required to achieve thread safety to amazing_dict when creating new keys, meaning that no two threads are allowed to generate the default value for the same missing key.

For example, running both amazing_dict["a"]["x"] = 0 and amazing_dict["a"]["y"] = 0 on two different threads must always result in amazing_dict = {"a": {"x": 0, "y": 0}} and never will one thread override amazing_dict["a"].

According to this great answer, defaultdict on its own is thread-safe, however, once the default_factory is using a python code (such as in amazing_dict) there is a potential to thread switch to occur before the factory is being called.

I want to make sure the call to the factory method is always done under lock. I have come up with two possible implementation that might provide thread safety on the factory method.

Option A - override the __missing__ method of defaultdict to be done under lock and check again if value exists in self before calling factory method.

class threadsafe_defaultdict(defaultdict):

    def __init__(self, default_factory=None, **kwargs) -> None:
        super().__init__(default_factory, kwargs)
        self._missing_lock = threading.Lock()

    def __missing__(self, key):
        with self._missing_lock:
            if key in self:
                return self[key]
            return super().__missing__(key)

vs Option B - override the __getitem__ method of dict to check if the item exists, if it already exists call super method normally, else call super method under lock.

class threadsafe_defaultdict(defaultdict):

    def __init__(self, default_factory=None, **kwargs) -> None:
        super().__init__(default_factory, kwargs)
        self._missing_lock = threading.Lock()

    def __getitem__(self, key):
        if key in self:
            return super().__getitem__(key)
        with self._missing_lock:
            return super().__getitem__(key)

In both cases amazing_dict would be generated as follows:

generate_amazing_dict = lambda: threadsafe_defaultdict(generate_amazing_dict)
amazing_dict = generate_amazing_dict()

Which of the two implementations is more correct (if any)? Also, further suggestions on achieving thread safety for this case are welcome

Yuval
  • 351
  • 3
  • 13

0 Answers0