2

Dataclass example:

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int 
    statuses: List[StatusElement]

JSON example:

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

I can unpack the JSON doing something like this:

object = List(**json)

But I'm not sure how can I also unpack the statuses into a status object and appened to the statuses list of the List object? I'm sure I need to loop over it somehow but not sure how to combine that with unpacking.

Zach Johnson
  • 1,787
  • 3
  • 20
  • 36

2 Answers2

5

Python dataclasses is a great module, but one of the things it doesn't unfortunately handle is parsing a JSON object to a nested dataclass structure.

A few workarounds exist for this:

  • You can either roll your own JSON parsing helper method, for example a from_json which converts a JSON string to an List instance with a nested dataclass.
  • You can make use of existing JSON serialization libraries. For example, pydantic is a popular one that supports this use case.

Here is an example using the dataclass-wizard library that works well enough for your use case. It's more lightweight than pydantic and coincidentally also a little faster. It also supports automatic case transforms and type conversions (for example str to annotated int)

Example below:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import JSONWizard


@dataclass
class List(JSONWizard):
    id: int
    statuses: PyList['StatusElement']
    # on Python 3.9+ you can use the following syntax:
    #   statuses: list['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}


object = List.from_dict(json)

print(repr(object))
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

Disclaimer: I am the creator (and maintainer) of this library.


You can now skip the class inheritance as of the latest release of dataclass-wizard. It's straightforward enough to use it; using the same example from above, but I've removed the JSONWizard usage from it completely. Just remember to ensure you don't import asdict from the dataclasses module, even though I guess that should coincidentally work.

Here's the modified version of the above without class inheritance:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import fromdict, asdict


@dataclass
class List:
    id: int
    statuses: PyList['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

# De-serialize the JSON dictionary into a `List` instance.
c = fromdict(List, json)

print(c)
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

# Convert the instance back to a dictionary object that is JSON-serializable.
d = asdict(c)

print(d)
# {'id': 124, 'statuses': [{'status': 'to do', 'orderIndex': 0, 'color': '#d3d3d3', 'type': 'open'}]}

Also, here's a quick performance comparison with dacite. I wasn't aware of this library before, but it's also very easy to use (and there's also no need to inherit from any class). However, from my personal tests - Windows 10 Alienware PC using Python 3.9.1 - dataclass-wizard seemed to perform much better overall on the de-serialization process.

from dataclasses import dataclass
from timeit import timeit
from typing import List

from dacite import from_dict

from dataclass_wizard import JSONWizard, fromdict


data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


class ListWiz(List, JSONWizard):
    ...


n = 100_000

# 0.37
print('dataclass-wizard:            ', timeit('ListWiz.from_dict(data)', number=n, globals=globals()))

# 0.36
print('dataclass-wizard (fromdict): ', timeit('fromdict(List, data)', number=n, globals=globals()))

# 11.2
print('dacite:                      ', timeit('from_dict(List, data)', number=n, globals=globals()))


lst_wiz1 = ListWiz.from_dict(data)
lst_wiz2 = from_dict(List, data)
lst = from_dict(List, data)

# True
assert lst.__dict__ == lst_wiz1.__dict__ == lst_wiz2.__dict__
rv.kvetch
  • 5,465
  • 3
  • 10
  • 28
  • 1
    This looks really slick. I've looked at pydantic and seems a bit heavy for what I'm trying to do. I'll have to give this library a shot. Thanks! – Zach Johnson Sep 04 '21 at 22:59
  • 1
    can confirm it's really faster, almost like default one: `dacite - .040896 ms`, `dataclass_wizard - .002921 ms`, `double ** extraction in a loop - .001776 ms` – Oleg May 15 '22 at 17:47
3

A "cleaner" solution (in my eyes). Use dacite

No need to inherit anything.

from dataclasses import dataclass
from typing import List
from dacite import from_dict

data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


lst: List = from_dict(List, data)
print(lst)

output

List(id=124, statuses=[StatusElement(status='to do', orderindex=0, color='#d3d3d3', type='open')])
balderman
  • 21,028
  • 6
  • 30
  • 43
  • 1
    This is also a very cool solution - I'll admit I hadn't tried `dacite` before. However, from personal tests `dacite` ended up being about **30x slower** in the de-serialization process (I might be missing an optimization step however) – rv.kvetch Sep 05 '21 at 16:23
  • 1
    But if we absolutely need to, we can also call `fromdict(data, List)` without extending from any class. Where the import is generated with `from dataclass_wizard.loaders import fromdict`. But just a note that this is technically not public API, so it might change in a future release. – rv.kvetch Sep 05 '21 at 17:27
  • 1
    Just a note, but I took the suggestion about inheritance being unnecessary to heart - the latest version of `dataclass-wizard` should now support a `fromdict` so regular data classes should work as well. I updated my answer above. – rv.kvetch Sep 06 '21 at 18:53
  • wondering why `decite` is so slow – Oleg May 15 '22 at 17:50