5

I use dictionaries as data structure a lot in my code. Instead of returning several value as Tuple like Python permits it :

def do_smth():
  [...]
  return val1, val2, val3

I prefer to use a dictionary with the advantage to have named keys. But with complex nested dictionary it's hard to navigate inside it. When I was coding with JS several years ago I liked dictionary too because I could call sub part like thing.stuff.foo and the IDE helped me with the structure.

I just discover the new DataClass in python and I'm not sure about the reason of this except to replace a dictionary ? For what I have read a DataClass cannot have function inside and the initialization of its arguments is simplified.

I would like to have comments about this, how do you use a DataClass, or about dictionary in python.

salty-horse
  • 139
  • 9
Ragnar
  • 2,210
  • 3
  • 31
  • 57

3 Answers3

12

Dataclasses are more of a replacement for NamedTuples, then dictionaries.

Whilst NamedTuples are designed to be immutable, dataclasses can offer that functionality by setting frozen=True in the decorator, but provide much more flexibility overall.

If you are into type hints in your Python code, they really come into play.

The other advantage is like you said - complex nested dictionaries. You can define Dataclasses as your types, and represent them within Dataclasses in a clear and concise way.

Consider the following:

@dataclass
class City:
    code: str
    population: int


@dataclass
class Country:
   code: str
   currency: str
   cities: List[City]


@dataclass
class Locations:
   countries: List[Country]

You can then write functions where you annotate the function param with dataclass name as a type hint and access it's attributes (similar to passing in a dictionary and accessing it's keys), or alternatively construct the dataclass and output it i.e.

def get_locations(....) -> Locations:
....

It makes the code very readable as opposed a large complicated dictionary.

You can also set defaults, which is not something that is allowed in NamedTuples but is allowed in dictionaries.

@dataclass
class Stock:
   quantity: int = 0

You can also control whether you want the dataclass to be ordered etc in the decorator just like whether want it to be frozen, whereas normal dictionaries are not ordered. See here for more information

You get all the benefits of object comparison if you want them i.e. __eq__() etc. They also by default come with __init__ and __repr__ so you don't have to type out those methods manually like with normal classes.

There is also substantially more control over fields, allowing metadata etc.

And lastly you can convert it into a dictionary at the end by importing from dataclasses import dataclass asdict

264nm
  • 607
  • 3
  • 12
  • 1
    "whereas normal dictionaries are not ordered" slight correction, since python 3.7 dictionaries are indeed ordered 2 years before your posting: https://gandenberger.org/2018/03/10/ordered-dicts-vs-ordereddict/ – Jonesn11 May 02 '22 at 21:34
1

My take on it.

A DataClass isn't there to necessarily replace a dictionary. Rather it is used as an object to hold some data where it makes sense in the modeling of an application.

Let's say we are building a simple address book. Assuming it is just storing some data, the Person class can be a dataclass with fields like name, phone_number, etc. We can then use a dictionary to create a lookup of name to Person such that we can retrieve this data class by name.

from dataclasses import dataclass
@dataclass
class Person:
    def __init__(self, name, address, phone_number):
        self.name = name
        self.address = address
        self.phone_number = phone_number

then elsewhere in the app:

persons = <LIST OF PERSONS>
address_book = {person.name: person for person in persons}

It is a rudimentary example but I hope it gets the idea across.

Of course one could argue why to use dataclass when a namedtuple would suffice?

Others have written on that topic:

k88
  • 1,648
  • 2
  • 11
  • 29
1

Go for it, is pure OO is it fine to have pure data classes especially if you are dealing with multi-threading. Still, my advice is to try to insert this information only where is needed and used (mixing the data class with functionalities).

BioShock
  • 753
  • 2
  • 12
  • 32
  • I'm working with data so i'm more into functional programming. I barely use OO (class, inheritance...) but can you tell me more about multi-threading that you pointed out. – Ragnar Feb 04 '20 at 10:50
  • @Ragnar how do you work with data? Are u using pandas? How do you write column names? Is it like df['column']? – cikatomo May 15 '21 at 21:28
  • Yes pandas all the way or PySpark RDD. – Ragnar May 16 '21 at 14:00