34

I have a dataclass object that has nested dataclass objects in it. However, when I create the main object, the nested objects turn into a dictionary:

@dataclass
class One:
    f_one: int
    f_two: str
    
@dataclass
class Two:
    f_three: str
    f_four: One


Two(**{'f_three': 'three', 'f_four': {'f_one': 1, 'f_two': 'two'}})

Two(f_three='three', f_four={'f_one': 1, 'f_two': 'two'})

obj = {'f_three': 'three', 'f_four': One(**{'f_one': 1, 'f_two': 'two'})}

Two(**obj)
Two(f_three='three', f_four=One(f_one=1, f_two='two'))

As you can see only **obj works.

Ideally I'd like to construct my object to get something like this:

Two(f_three='three', f_four=One(f_one=1, f_two='two'))

Is there any way to achieve that other than manually converting nested dictionaries to corresponding dataclass object, whenever accessing object attributes?

Thanks in advance.

Mark
  • 868
  • 16
  • 27
mohi666
  • 6,312
  • 9
  • 39
  • 48
  • 6
    Your second approach wordks fine if you actually use `obj`. `Two(**obj)` gives me `Two(f_three='three', f_four=One(f_one=1, f_two='two'))` – Patrick Haugh Jul 27 '18 at 20:10
  • Thanks for pointing out my mistake. Any idea if it's possible to do achieve the same result using the first approach? Second approach seems too tedious, if you have multiple nested objects in your dataclass object. – mohi666 Jul 27 '18 at 20:27
  • Possible duplicate of [Python dataclass from dict](https://stackoverflow.com/questions/53376099/python-dataclass-from-dict) – Arne Mar 11 '19 at 15:04

9 Answers9

33

This is a request that is as complex as the dataclasses module itself, which means that probably the best way to achieve this "nested fields" capability is to define a new decorator, akin to @dataclass.

Fortunately, if you don't need the signature of the __init__ method to reflect the fields and their defaults, like the classes rendered by calling dataclass, this can be a whole lot simpler: A class decorator that will call the original dataclass and wrap some functionality over its generated __init__ method can do it with a plain "...(*args, **kwargs):" style function.

In other words, all one needs to do is write a wrapper around the generated __init__ method that will inspect the parameters passed in "kwargs", check if any corresponds to a "dataclass field type", and if so, generate the nested object prior to calling the original __init__. Maybe this is harder to spell out in English than in Python:

from dataclasses import dataclass, is_dataclass

def nested_dataclass(*args, **kwargs):
    def wrapper(cls):
        cls = dataclass(cls, **kwargs)
        original_init = cls.__init__
        def __init__(self, *args, **kwargs):
            for name, value in kwargs.items():
                field_type = cls.__annotations__.get(name, None)
                if is_dataclass(field_type) and isinstance(value, dict):
                     new_obj = field_type(**value)
                     kwargs[name] = new_obj
            original_init(self, *args, **kwargs)
        cls.__init__ = __init__
        return cls
    return wrapper(args[0]) if args else wrapper

Note that besides not worrying about __init__ signature, this also ignores passing init=False - since it would be meaningless anyway.

(The if in the return line is responsible for this to work either being called with named parameters or directly as a decorator, like dataclass itself)

And on the interactive prompt:

In [85]: @dataclass
    ...: class A:
    ...:     b: int = 0
    ...:     c: str = ""
    ...:         

In [86]: @dataclass
    ...: class A:
    ...:     one: int = 0
    ...:     two: str = ""
    ...:     
    ...:         

In [87]: @nested_dataclass
    ...: class B:
    ...:     three: A
    ...:     four: str
    ...:     

In [88]: @nested_dataclass
    ...: class C:
    ...:     five: B
    ...:     six: str
    ...:     
    ...:     

In [89]: obj = C(five={"three":{"one": 23, "two":"narf"}, "four": "zort"}, six="fnord")

In [90]: obj.five.three.two
Out[90]: 'narf'

If you want the signature to be kept, I'd recommend using the private helper functions in the dataclasses module itself, to create a new __init__.

Oliver
  • 1,530
  • 1
  • 16
  • 29
jsbueno
  • 86,446
  • 9
  • 131
  • 182
  • You have chained attrs with a decorator. That's awesome. – pylang Sep 09 '18 at 02:50
  • Indeed - I think this snippet may deserve a Pypi module on its own. I see that it is published. – jsbueno Sep 10 '18 at 05:11
  • 4
    For the record, `dataclasses.is_dataclass(f.type)` return false for fields of type `List[dataclass]`, so your decorator skips over such fields. See https://stackoverflow.com/questions/53376099/python-dataclasse-from-dict?noredirect=1#comment93683831_53376099 – mbatchkarov Nov 21 '18 at 09:48
  • Note that this address works, however, dataclasses use deepcopy() internally which can slow down things significantly when it comes to data sterilization of large objects. – mohi666 Mar 12 '19 at 18:36
  • 6
    *update*: people needing this, please check the "pydantic" library - I think it can handle this, with enough code to provide for the corner cases. – jsbueno Nov 01 '19 at 02:57
  • I like to add a `elif field_type is None` clause so that I can `warn()` about ignored extra fields. Then `else: new_kwargs[name] = value` to finish it off. – I'll Eat My Hat Nov 07 '19 at 03:28
  • pointing to @mbatchkarov comment. I am looking for exactly that. Anyone? – M.wol Sep 10 '21 at 06:05
  • It is an "elif" condition and 2 more lines on the code above. One should be able to that on their own. (I won't be updatinbg the answer now, but I might rewrite this code as a gist so it works with lists) – jsbueno Sep 10 '21 at 11:30
18

You can use post_init for this

from dataclasses import dataclass
@dataclass
class One:
    f_one: int
    f_two: str

@dataclass
class Two:
    f_three: str
    f_four: One
    def __post_init__(self):
        self.f_four = One(**self.f_four)

data = {'f_three': 'three', 'f_four': {'f_one': 1, 'f_two': 'two'}}

print(Two(**data))
# Two(f_three='three', f_four=One(f_one=1, f_two='two'))
16

You can try dacite module. This package simplifies creation of data classes from dictionaries - it also supports nested structures.

Example:

from dataclasses import dataclass
from dacite import from_dict

@dataclass
class A:
    x: str
    y: int

@dataclass
class B:
    a: A

data = {
    'a': {
        'x': 'test',
        'y': 1,
    }
}

result = from_dict(data_class=B, data=data)

assert result == B(a=A(x='test', y=1))

To install dacite, simply use pip:

$ pip install dacite
Konrad Hałas
  • 4,124
  • 3
  • 17
  • 18
13

Instead of writing a new decorator I came up with a function modifying all fields of type dataclass after the actual dataclass is initialized.

def dicts_to_dataclasses(instance):
    """Convert all fields of type `dataclass` into an instance of the
    specified data class if the current value is of type dict."""
    cls = type(instance)
    for f in dataclasses.fields(cls):
        if not dataclasses.is_dataclass(f.type):
            continue

        value = getattr(instance, f.name)
        if not isinstance(value, dict):
            continue

        new_value = f.type(**value)
        setattr(instance, f.name, new_value)

The function could be called manually or in __post_init__. This way the @dataclass decorator can be used in all its glory.

The example from above with a call to __post_init__:

@dataclass
class One:
    f_one: int
    f_two: str

@dataclass
class Two:
    def __post_init__(self):
        dicts_to_dataclasses(self)

    f_three: str
    f_four: One

data = {'f_three': 'three', 'f_four': {'f_one': 1, 'f_two': 'two'}}

two = Two(**data)
# Two(f_three='three', f_four=One(f_one=1, f_two='two'))
Yourstruly
  • 386
  • 2
  • 9
6

I have created an augmentation of the solution by @jsbueno that also accepts typing in the form List[<your class/>].

def nested_dataclass(*args, **kwargs):
    def wrapper(cls):
        cls = dataclass(cls, **kwargs)
        original_init = cls.__init__

        def __init__(self, *args, **kwargs):
            for name, value in kwargs.items():
                field_type = cls.__annotations__.get(name, None)
                if isinstance(value, list):
                    if field_type.__origin__ == list or field_type.__origin__ == List:
                        sub_type = field_type.__args__[0]
                        if is_dataclass(sub_type):
                            items = []
                            for child in value:
                                if isinstance(child, dict):
                                    items.append(sub_type(**child))
                            kwargs[name] = items
                if is_dataclass(field_type) and isinstance(value, dict):
                    new_obj = field_type(**value)
                    kwargs[name] = new_obj
            original_init(self, *args, **kwargs)

        cls.__init__ = __init__
        return cls

    return wrapper(args[0]) if args else wrapper
Daan Luttik
  • 2,703
  • 2
  • 22
  • 37
  • 1
    Using your decorator I get: AttributeError: type object 'list' has no attribute '__origin__' if one of dataclasses attribute is annotated List[SomeClass] – M.wol Sep 24 '21 at 12:13
2

If you are okay with pairing this functionality with the non-stdlib library attrs (a superset of the functionality that dataclass stdlib provides), then the cattrs library provides a structure function which handles the conversion of native data types to dataclasses and will use type annotations automatically.

Alex Waygood
  • 4,796
  • 3
  • 14
  • 41
1

Very important question is not nesting, but value validation / casting. Do you need validation of values?

If value validation is needed, stay with well-tested deserialization libs like:

  • pydantic (faster but messy reserved attributes like schema interfere with attribute names coming from data. Have to rename and alias class properties enough to make it annoying)
  • schematics (slower than pydantic, but much more mature typecasting stack)

They have amazing validation and re-casting support and are used very widely (meaning, should generally work well and not mess up your data). However, they are not dataclass based, though Pydantic wraps dataclass functionality and allows you to switch from pure dataclasses to Pydantic-supported dataclasses with change of import statement.

These libs (mentioned in this thread) work with dataclasses natively, but validation / typecasting is not hardened yet.

  • dacite
  • validated_dc

If validation is not super important, and just recursive nesting is needed, simple hand-rolled code like https://gist.github.com/dvdotsenko/07deeafb27847851631bfe4b4ddd9059 is enough to deal with Optional and List[ Dict[ nested models.

ddotsenko
  • 4,760
  • 22
  • 24
0
from dataclasses import dataclass, asdict

from validated_dc import ValidatedDC


@dataclass
class Foo(ValidatedDC):
    one: int
    two: str


@dataclass
class Bar(ValidatedDC):
    three: str
    foo: Foo


data = {'three': 'three', 'foo': {'one': 1, 'two': 'two'}}
bar = Bar(**data)
assert bar == Bar(three='three', foo=Foo(one=1, two='two'))

data = {'three': 'three', 'foo': Foo(**{'one': 1, 'two': 'two'})}
bar = Bar(**data)
assert bar == Bar(three='three', foo=Foo(one=1, two='two'))

# Use asdict() to work with the dictionary:

bar_dict = asdict(bar)
assert bar_dict == {'three': 'three', 'foo': {'one': 1, 'two': 'two'}}

foo_dict = asdict(bar.foo)
assert foo_dict == {'one': 1, 'two': 'two'}

ValidatedDC: https://github.com/EvgeniyBurdin/validated_dc

Evgeniy_Burdin
  • 429
  • 4
  • 11
0

dataclass-wizard is a modern option that can alternatively work for you. It supports complex types such as date and time, generics from the typing module, and a nested dataclass structure.

Other "nice to have" features such as implicit key casing transforms - i.e. camelCase and TitleCase, which are quite common in API responses - are likewise supported out of box.

The "new style" annotations introduced in PEPs 585 and 604 can be ported back to Python 3.7 via a __future__ import as shown below.

from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import fromdict, asdict, DumpMeta


@dataclass
class Two:
    f_three: str | None
    f_four: list[One]


@dataclass
class One:
    f_one: int
    f_two: str


data = {'f_three': 'three',
        'f_four': [{'f_one': 1, 'f_two': 'two'},
                   {'f_one': '2', 'f_two': 'something else'}]}

two = fromdict(Two, data)
print(two)

# setup key transform for serialization (default is camelCase)
DumpMeta(key_transform='SNAKE').bind_to(Two)

my_dict = asdict(two)
print(my_dict)

Output:

Two(f_three='three', f_four=[One(f_one=1, f_two='two'), One(f_one=2, f_two='something else')])
{'f_three': 'three', 'f_four': [{'f_one': 1, 'f_two': 'two'}, {'f_one': 2, 'f_two': 'something else'}]}

You can install Dataclass Wizard via pip:

$ pip install dataclass-wizard
rv.kvetch
  • 5,465
  • 3
  • 10
  • 28