Python Code Generators (NamedTuple and dataclass)

Rahul Beniwal
4 min readApr 3, 2023

--

Code generators can help to write clean code faster if we define specifications correctly. Today lets us talk about two popular choices for code generation.

Lets create a simple class

class Point:

def __init__(self, x:int, y:int):
self.x=x
self.y=y

def __repr__(self):
return self.__class__.__qualname__ + f"(x={self.x!r}, y={self.y!r})"

def __eq__(self, other):
if other.__class__ is self.__class__:
return (self.x,self.y,)==(other.x,other.y,)
return NotImplemented

So writing this same code with dataclass and NamedTuples

# NamedTuple 
from typing import NamedTuple

class Point(NamedTuple):
x: int
y: int


# Dataclasses
from dataclasses import dataclass

@dataclass
class Point:
x: int
y: int

Here code generators provide implementation of special dunder methods implicitly. Code generators also store class metadata and provide special methods.

Dataclass

  1. asdict use for representing object properties to dict
  2. astuple use for representing object properties as tuple.
  3. replace use for creating new object from existing object by replacing one or more attributes.
  4. __annotations__ provide information regrading the type annotation applied on attributes.
  5. fields use for accessing the descriptor which is use for internal representation of these attributes. i shall discuss fields later in this blog.
####### dataclasses #########

# creating a dict from a instance args
from dataclasses import asdict
print("class_dict", asdict(c))
## Output -> class_dict {'x': 1, 'y': 2}


# creating a tuple from a instance args
from dataclasses import astuple
print("class_tuple", astuple(c))
## Output -> class_tuple (1, 2)


# creating a new instance from an existing one
from dataclasses import replace
c2 = replace(c, x=3)
print("new_object", c2)
## Output -> new_object Point(x=3, y=2)


# fetching all annotations from a instance
print("Annotations", c.__annotations__)
## Output -> Annotations {'x': <class 'int'>, 'y': <class 'int'>}


# extracting all fields from a instance
from dataclasses import fields
print(" ** Fields ** ")
class_fields = fields(c)
for field in class_fields:
print(field)

## Output ->
# Field(name='x',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x7f4402f0e490>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4402f0e490>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)
# Field(name='y',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x7f4402f0e490>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4402f0e490>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)

NamedTuple

  1. _asdict use for representing object attributes as dict.
  2. builtin tuple for representing object attributes as tuple.
  3. _replace use for creating new instance from existing.
  4. __annotations__ provide information regrading the type annotation applied on attributes.
# creating a dict from object attributes
print("class_dict", d._asdict())
## Output -> class_dict {'x': 1, 'y': 2}

# creating a tuple from a object attributes
print("class_tuple", tuple(d))
## Output -> class_tuple (1, 2)

# creating a new class instance from an existing one
d2 = d._replace(x=3)
print("new_object", d2)
## Output -> new_object Point2(x=3, y=2)

# fetching all annotations from a instance
print("Annotations", d.__annotations__)
## Output -> Annotations {'x': <class 'int'>, 'y': <class 'int'>}

Diff bw Dataclass and NamedTuple

  1. NamedTuple can be unpack but dataclass cannot.
# valid code 
x_co, y_co = d

## invalid code
x_co, y_co = c

2. NamedTuple are immutable while dataclass are not.

# invalid code
d.x = 100

# valid code
c.x = 100

3. NamedTuple is hashable while dataclass is not.

# invalid code
{c: "100"}

# valid code
{d: "100"}

4. NamedTuple is c binding based implementation while dataclass are pure python.

5. NamedTuple are iterable while dataclass are not.

# valid code 
for point in d:
print(point)

# will print the value of properties of instance d

# invalid code
for point in c:
print(point)
Comparison Table

Note -> These are the difference between the default dataclass and NamedTuple but dataclass can be tweak for advance uses.

Tweaking default dataclass.

Adding ordering (ability for sorting and comparison)

from dataclasses import dataclass

@dataclass(order=True)
class Point:
x: int
y: int = 10


p1 = Color(1,2)
p2 = Color(-1,2)

print(sorted([p1,p2]))
# [Color(x=-1, y=2), Color(x=1, y=2)]

print(p1 == p2)
# False
print(p1 > p2)
# True
print(p1 < p2)
# False
print(p1 >= p2)
# True
print(p1 <= p2)
# False

## comparision will be done first with x attribute of each object and then with y
# so order matter when declaring attribute while defining class.

order=True will implement __lt__, __le__, __gt__, __ge__ methods.

Making dataclass immutable

from dataclasses import dataclass, replace

@dataclass(order=True, frozen=True)
class Point:
x: int
y: int = 10

p1 = Color(1,2)

# invalid code
p1.x = 3
del p1.x

# valid code
p1.z = 100
p2 = replace(p1, 3)

# valid code
{p2 : "100"}

Now Point instances are hashable and can be use as key for dictionary.

frozen=True will overwrite existing __hash__, __delattr__ and __setattr__ instead of using property for providing immutability.

Using field for replacing default behavior.

from dataclasses import dataclass, field
from datetime import datetime


@dataclass(order=True, unsafe_hash=True)
class Employee:
emp_id:int = field()
name: str = field()
gender: str = field()
salary: int = field(hash=False, repr=False, metadata={'units': 'USD'})
age: int = field(hash=False)
viewed_by:list = field(default_factory=list, compare=False, repr=False)

def access(self, access_id):
self.viewed_by.append((access_id, datetime.now()))


e = Employee(1, "John", "M", 1000, 30)
f = Employee(2, "Jane", "F", 2000, 25)
e.access("Manager1")
e.access("Manager2")

f.access("Manager1")
f.access("Manager2")

result = sorted([e,f])
print(result)

print(e)
[Employee(emp_id=1, name='John', gender='M', age=30), Employee(emp_id=2, name='Jane', gender='F', age=25)]
Employee(emp_id=1, name='John', gender='M', age=30)

1️⃣ hash=False will not include attribute while calculating hash in __hash__ method.

2️⃣ repr=False will now include attribute in __repr__.

3️⃣ metadata can be accessed by fields(e)[3].metadata .

4️⃣ default_factory use for creating new instance of collection instead of default value similar to defaultdict (kindof).

So Dataclass provide more flexibility in comparision of NamedTuple.

Using Dataclass with Inheritance

In most of the cases we also need to use dataclass with inheritance, So dataclass provide __post_init__ method which calls after initialization has been done.

from dataclasses import dataclass, asdict


class Serialize:
def __init__(self, kwargs):
self.kwargs = kwargs

def serialize(self):
import json
return json.dumps(self.kwargs)


@dataclass
class Point(Serialize):
x: int
y: int

def __post_init__(self):
super().__init__(asdict(self))


c = Point(1,2)
print(c.serialize())

## Output -> {"x": 1, "y": 2}

Yeah we got enough dose of code generators for start writing clean code.

Leave a 👏 if you like this

--

--

Rahul Beniwal
Rahul Beniwal

Written by Rahul Beniwal

I can help you master Python | Backend | System Design | Django | Rust | Computer Science | Databases | Making complex concepts clear and accessible with code

No responses yet