Published on

Python Dataclasses

Authors

A Data Class is a collection of data, also known as Data Elements, that are related to each other in some way. For example, each Data Element could appear in the same table of a database, or the same section within a form.

Table of Contents

  1. How Dataclasses Help You Write Better Classes
  2. Unlocking the Power of Class Properties
  3. What's the Difference Between str and repr?
  4. Dataclasses parameters

How Dataclasses Help You Write Better Classes

Data Classes are the building blocks of a Data Model. Within each Data Class lies several Data Elements and these are the descriptions of an individual field, variable, column or property.

You can also have a Data Class within a Data Class, known as a Nested Data Class, which can be a useful way of managing complex sets of data. There is no limit on the number of Nested Data Classes you can include.

dataclass library helps you to write a better class model. What makes this a data class is the @dataclass decorator just above the class definition. Beneath the class Position: line, you simply list the fields you want in your data class. The : notation used for the fields is using a new feature in Python 3.6 called variable annotations. We will soon talk more about this notation and why we specify data types like str and float.

import random
import string
from dataclasses import dataclass, field
from typing import Optional

def generate_id() -> str:
    return "".join(random.choices(string.ascii_uppercase, k=12))

@dataclass
class Car (kw_only=True):
   id: str = field(init=False, default_factory=generate_id)
   brand: str
   category: str
   year: int
   price: float = field(init=False, repr=False)
   used: bool = False
   registration: Optional[str] = None

    def __str__(self) -> str:
        return f"{self.brand} {self.year}"

def main() -> None:
    electric_car = Car(brand="Tesla", category="electric", year=2019)
    diesel_car = Car(brand="BMW", category="diesel", year=2019, registration="1234ABC", used='True')

    print(electric_car)     # output: Tesla 2019

    print(repr(diesel_car))
    # output: Car(id='EKOLFEPRWBVF', brand='BMW', category='diesel', year=2019, used='True', registration='1234ABC')
    print(f"{electric_car!r}")
    # output: Car(id='RDXGEDMUQHBH', brand='Tesla', category='electric', year=2019, used=False, registration=None)

if __name__ == "__main__":
    main()

id: This is a unique identifier for the car. It is generated automatically when the car is created. The id is a string of 12 uppercase letters. However, since it is generated automatically, we don’t want to allow it to be set by the user. Therefore, we use the init=False option to tell dataclass not to include this field in the init method.

brand: This is the brand of the car. It is a string and is required. Therefore, we don’t need to specify any options for this field.

category: This is the category of the car. It is a string and is required. Therefore, we don’t need to specify any options for this field.

year: This is the year the car was manufactured. It is an integer and is required. Therefore, we don’t need to specify any options for this field.

price: This is the price of the car. It is a float and is required. Therefore, we don’t need to specify any options for this field.

used: This is a boolean value that indicates whether the car is used or not. It is not required. Therefore, we don’t need to specify any options for this field.

registration: This is the registration number of the car. It is a string and is not required. Therefore, we don’t need to specify any options for this field.

kw_only=True: This option tells dataclass that the fields in the data class can only be set using keyword arguments. This means that the following code will not work:

default_factory: Function that returns the initial value of the field

Unlocking the Power of Class Properties

Following the example above we can had some properties to the Car lass, such as: taxes, total_price and _search_string. Properties are a special kind of attribute that have a getter, setter, and deleter methods. They are accessed just like normal attributes, but behind the scenes, the getter, setter, or deleter methods are called.

@dataclass
class Car (kw_only=True):
  ....

  @property
  def search_string(self) -> str:
    return f"{self.registration}"

  @search_string.setter
  def search_string(self, value: str) -> None:
    self.registration = value

  @property
  def taxes(self) -> float:
    return self.price* 0.45 if self.category == 'electric' else self.price * 0.25


@dataclass
class Parking:
   cars: List[Car] = field(default_factory=list) # default_factory: Function that returns the initial value of the field -> cars = []
   title: str = "Untitled"

  @property
  def total_cars_asset(self) -> int:
      return sum([car.price for car in self.cars])


def main() -> None:
    electric_car.price = 45000.00
    diesel_car.price = 23000.00

    print(electric_car.taxes)
    # output: 20250.0
    print(diesel_car.taxes)
    # output: 5750.0

    print(diesel_car.search_string)
    # output: 1234ABC
    diesel_car.taxes = 2300
    # AttributeError: properties can't set attribute
    diesel_car.search_string = '192HT2732'
    # output: 192HT2732. This property can be changed

    parking = Parking(cars=[electric_car, diesel_car], title="North Parking")
    print(parking.total_cars_asset) # output: 68000.0

if __name__ == "__main__":
    main()

What's the Difference Between str and repr?

While this representation of a Car class is explicit and readable, it is also very verbose. Let us add a more concise representation. In general, a Python object has two different string representations:

__repr__(obj) is defined by obj. __repr__() and should return a developer-friendly representation of obj. If possible, this should be code that can recreate obj. Data classes do this.

__str__(obj) is defined by obj. __str__() and should return a user-friendly representation of obj. Data classes do not implement a .__str__() method, so Python will fall back to the __repr__() method.

__eq__(): If true (the default), an eq() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type.

__post_init__() it allows for special processing after the regular __init__() method is called.

@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)

    def __post_init__(self):
        self.c = self.a + self.b

The __init__() method generated by dataclass() does not call base class __init__() methods. If the base class has an __init__() method that has to be called, it is common to call this method in a __post_init__() method:

@dataclass
class Rectangle:
    height: float
    width: float

@dataclass
class Square(Rectangle):
    side: float

    def __post_init__(self):
        super().__init__(self.side, self.side)

If the class already defines eq(), this parameter is ignored.

@dataclass
class Car:
  sort_index: int = field(init=False, repr=False)
  ....

  def __post_init__(self):
      self._sort_index = f"id: {self.id}, price: {self.brand}"

  def __eq__(self, other):
    if not isinstance(other, self.__class__):
        return False
    return (self.registration) == (other.registration)


def main() -> None:
  cars = []
  for i in range(1,5):
    with open(f"cars/car{i}.json", "r") as f:
        car_info = json.load(f)
        cars.append(Car(**car_info))

  first_car = Car(brand="Tesla", category="electric", year=2019, registration="1234ABC")
  second_car = Car(brand="BMW", category="diesel", year=2019, registration="1234ABC")

  print(first_car == second_car) # output: True.
  # __eq__() method indicats that all cars with the same registration number are equal

if __name__ == "__main__":
    main()

Dataclasses parameters

we have other parameters of dataclass that we should look at before moving on:

  1. order: enables sorting of the class as we'll see in the next section. The default is False.
  2. frozen: When True, the values inside the instance of the class can't be modified after it's created. The default is False.
@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)
class Car:
  ......

order: When the order parameter is set to True, it automatically generates the lt (less than), le (less or equal), gt (greater than), and ge (greater or equal) methods used for sorting.

init: when the init parameter is set to True, a __init__() method will be generated.

repr: when the init parameter is set to True, a __repr__() method will be generated. The generated repr string will have the class name and the name and repr of each field, in the order they are defined in the class. Fields that are marked as being excluded from the repr are not included. For example: InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10).

eq: If true (the default), an eq() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines eq(), this parameter is ignored.

frozen: turn the class immutable or frozen, fields cannot be changed. If true (the default is False), assigning to fields will generate an exception. This emulates read-only frozen instances. If __setattr__() or __delattr__() is defined in the class, then TypeError is raised.

match_args: If true (the default is True), the __match_args__ tuple will be created from the list of parameters to the generated __init__() method (even if __init__() is not generated). If false, or if __match_args__ is already defined in the class, then __match_args__ will not be generated.

kw_only: If true (the default value is False), then all fields will be marked as keyword-only. If a field is marked as keyword-only, then the only effect is that the __init__() parameter generated from a keyword-only field must be specified with a keyword when __init__() is called. There is no effect on any other aspect of dataclasses. See the parameter glossary entry for details. Also see the KW_ONLY section.

slots: If true (the default is False), __slots__ attribute will be generated and new class will be returned instead of the original one. If __slots__ is already defined in the class, then TypeError is raised.

References: