Idiomatic Python (and a Touch of C++) — Type Hints, Protocols, Dataclasses
Session 29 of the 48-session learning series.
Why this session matters
This is Session 29 of 48 in the OOP & Languages track. "Idiomatic" Python looks like Python; novice Python looks like Java written in Python's syntax. The gap shows up in code reviews, in interviews, and in maintainability over years. A bit of modern C++ contrast keeps you sharp on what "fast" and "low-level" actually mean.
Agenda
- Type hints, generics,
TypeVar,ParamSpec— modern Python typing - Protocols vs ABCs — structural vs nominal subtyping
- Dataclasses, frozen, slots; when to use Pydantic instead
- Pythonic idioms — comprehensions, context managers, generators, dunder methods
- A short detour: equivalent C++ idioms (RAII, templates,
concept)
Pre-read (skim before the session)
- PEP 484 — Type hints
- PEP 544 — Protocols (structural typing)
- PEP 695 — Type Parameter Syntax (3.12+)
- Brett Slatkin — Effective Python (2nd ed.)
Deep dive
1. Why type hints
They don't enforce anything at runtime. So why bother?
- mypy / pyright catch a real class of bugs (
Nonepassed to astrparam) at PR time. - IDE autocomplete becomes useful — methods, fields, return values all surface.
- Documentation that doesn't drift — the type is the spec.
- Refactoring is safe — rename a field and the type checker finds every caller.
Cost: ~10% more keystrokes; pays back within a quarter on any non-trivial codebase.
2. Modern typing essentials
from typing import Optional, Sequence, Iterator, Callable
from collections.abc import Mapping
def top_k(items: Sequence[int], k: int = 10) -> list[int]:
return sorted(items, reverse=True)[:k]
def parse(text: str) -> Optional[dict]:
...
def stream() -> Iterator[bytes]:
...
Handler = Callable[[str, int], bool]
Python 3.9+: list[int] instead of List[int]. 3.10+: int | None instead of Optional[int]. 3.12+: cleaner type syntax (PEP 695).
3. Generics and TypeVar
from typing import TypeVar
T = TypeVar("T")
def first[T](items: list[T]) -> T: # 3.12+ syntax
return items[0]
def first_legacy(items: list[T]) -> T: # pre-3.12
return items[0]
Use generics on containers, factories, and any function whose return type depends on input type.
4. Protocols (structural typing)
ABCs (abc.ABC) require explicit inheritance — nominal typing. Protocols check "does this object have these methods?" at type-check time — structural typing (Go interfaces, TS interfaces).
from typing import Protocol
class Readable(Protocol):
def read(self, n: int = -1) -> bytes: ...
def consume(src: Readable) -> bytes:
return src.read()
# Works with file, BytesIO, anything with a .read() method — no inheritance required.
Use Protocols when:
- Defining duck-typed APIs.
- Decoupling from a specific class hierarchy.
- Mocking — your test double satisfies the Protocol; no inheritance ceremony.
5. Dataclasses
from dataclasses import dataclass, field
@dataclass(frozen=True, slots=True)
class Point:
x: float
y: float
label: str = "anon"
metadata: dict = field(default_factory=dict)
p = Point(1.0, 2.0)
frozen=True— immutable, hashable (good for cache keys).slots=True— no__dict__; smaller memory; ~20% attribute access speedup.field(default_factory=...)— for mutable defaults; neverfield=[].
Default to dataclass for plain data carriers. Reach for Pydantic when you need validation/parsing from JSON.
6. Pydantic v2
from pydantic import BaseModel, Field
class User(BaseModel):
id: int
email: str = Field(pattern=r"[^@]+@[^@]+\.[^@]+")
age: int | None = None
u = User.model_validate_json('{"id": 1, "email": "a@b.com"}')
Use Pydantic at the edges of your system (HTTP request parsing, config loading). Use dataclasses internally. Mixing them inside business logic creates redundant validation.
7. Comprehensions, generators, the itertools toolbox
squares = [x*x for x in range(10)]
even_squares = [x*x for x in range(10) if x % 2 == 0]
lookup = {u.id: u for u in users}
unique_emails = {u.email for u in users}
# Generator (lazy, low memory)
def stream_squares(n):
for x in range(n):
yield x * x
from itertools import chain, groupby, accumulate, pairwise
Generators are the killer feature for ETL — process TB-sized streams without loading into RAM.
8. Context managers
from contextlib import contextmanager
@contextmanager
def timed(label: str):
t = time.perf_counter()
try:
yield
finally:
print(f"{label}: {time.perf_counter() - t:.3f}s")
with timed("query"):
rows = db.fetch(...)
Use them for: timing, transactions, locks, temp-file cleanup, mocking. Anything with "set up, do work, always tear down" shape.
9. Dunder methods
Implement the protocol the language expects:
| Want | Dunder |
|---|---|
len(x) | __len__ |
for ... in x | __iter__ |
x[i] | __getitem__ |
x == y | __eq__ |
hash(x) | __hash__ (must match __eq__) |
x + y | __add__ |
print(x) | __str__ (user); __repr__ (dev) |
with x: | __enter__ + __exit__ |
x() | __call__ |
Always implement __repr__ on data classes — debugging without it is misery.
10. The performance escape hatches
Python is slow; sometimes you need fast. Order of attempt:
numpy/pandas/polars— vectorise. 100x easy.numba@jit— JIT compile a hot loop. 10–100x for numeric code.cython— compile a module. Static types optional, escape GIL withnogil:.pybind11/cffi— bind C/C++ for true native speed.- Rewrite the hot path in Rust (
pyo3). Modern teams' choice.
Profile before optimising. cProfile + snakeviz for CPU; tracemalloc for memory; py-spy for prod sampling.
11. A short C++ contrast
| Concept | Python | Modern C++ |
|---|---|---|
| Resource cleanup | with / __exit__ | RAII (destructors) |
| Polymorphism | duck typing / Protocols | virtual functions / concept (C++20) |
| Generics | TypeVar / Protocols | templates / concept |
| Immutability | frozen=True dataclass | const, constexpr |
| Threads | GIL — use multiprocessing/asyncio | true parallel threads + std::atomic |
| Memory | GC | manual via unique_ptr / shared_ptr |
| Build | pip install | cmake / vcpkg (sigh) |
C++20 concept is essentially compile-time Protocols. The convergence is real.
12. Reality check
Idiomatic Python checklist for a new project:
- Strict mode
mypy --strict(orpyrightstrict). - Dataclasses or Pydantic — pick per layer, don't mix internally.
rufffor lint + format (replaces black/isort/flake8 in 1 tool).- pytest with
pytest-cov, target 80% coverage on critical paths. - Pre-commit hook: ruff + mypy + pytest fast tier.
You won't regret typed Python at 6 months. Lots of teams regret not adopting it earlier.
Reading material
Books:
- Fluent Python, 2nd ed. — Luciano Ramalho (the canonical "write Python like Python" book; the chapters on protocols + ABCs are gold)
- Effective Python, 3rd ed. — Brett Slatkin (90 specific actionable items; the modern Python style guide)
- Python Type Hints — Patrick Viafore (the dedicated typing book; PEP-by-PEP)
- Architecture Patterns with Python — Harry Percival & Bob Gregory (ports & adapters in Python; protocols in action)
- Robust Python — Patrick Viafore (the modern "Python for production" book; types, contracts, plug-ins)
Papers / PEPs:
- PEP 484 — Type Hints (Guido van Rossum et al. 2014) — the foundational PEP.
- PEP 544 — Protocols: Structural subtyping (Ivan Levkivskyi 2017) — duck typing meets static checking.
- PEP 695 — Type Parameter Syntax (Eric Traut 2022) — the Python 3.12 generic-class syntax.
- PEP 692 — Using TypedDict for kwargs typing (Franek Magiera 2022) — the modern way to type **kwargs.
- PEP 702 — Marking deprecations using the type system (Jelle Zijlstra 2023) — the
@deprecateddecorator.
Official docs:
- Python
typingmodule docs — the API reference; the place to find every primitive. - mypy docs — the original PEP-484 type checker.
- Pyright docs — Microsoft's faster type checker (powers Pylance).
- Pydantic v2 docs — runtime validation built on typing.
- Black — The uncompromising code formatter — the formatting end of the toolchain.
- Ruff docs — the modern Rust-based linter + formatter (replaces Black + flake8 + isort).
Blog posts:
- Real Python — Python Type Checking (Guide) — the long-form, code-heavy walk-through.
- Glyph — The One Python Library Everyone Needs (attrs / dataclasses) — the design philosophy behind modern Python objects.
- Itamar Turner-Trauring — Best practices for software engineers (in Python) — the production-Python blog.
In-depth research material
- cpython — github.com/python/cpython — ~62k ★, the reference implementation; read
Lib/typing.pyto understand types from the inside. - mypy — github.com/python/mypy — ~18k ★, the canonical static type checker.
- pyright — github.com/microsoft/pyright — ~13k ★, the speed-of-light Microsoft checker.
- pydantic — github.com/pydantic/pydantic — ~22k ★, runtime validation on typing primitives.
- ruff — github.com/astral-sh/ruff — ~34k ★, the modern linter that replaced everything.
- hypothesis — github.com/HypothesisWorks/hypothesis — ~7k ★, property-based testing that pairs beautifully with types.
- attrs — github.com/python-attrs/attrs — ~5k ★, the predecessor to dataclasses; still richer.
- Awesome Python Typing — github.com/typeddjango/awesome-python-typing — the curated list of typed Python libraries.
- Real Python — The Ultimate Guide to Python Type Checking — the long-form walkthrough.
- Daniele Procida — The grand unified theory of documentation — the canonical model for documenting typed Python.
Videos
- Python Typing Deep Dive — mCoding (James Murphy) · 40 min — best modern "why types in Python" walkthrough.
- Python 3.12: The Most Important New Features — mCoding — mCoding · 16 min — PEP 695 generic syntax and friends.
- Type-Driven Design in Python — David Beazley — David Beazley · 50 min — the legendary Python teacher on types.
- Modern Python Type Hints (PEP 695, 698) — ArjanCodes — ArjanCodes · 27 min — the practical PEP-walk-through with code.
- Beyond Pep 8 — Raymond Hettinger (PyCon) — Raymond Hettinger · 53 min — the classic that taught a generation what "idiomatic Python" actually means.
LeetCode — Design HashMap
- Link: https://leetcode.com/problems/design-hashmap/
- Difficulty: Easy
- Why this problem: Implement
__getitem__/__setitem__/__delitem__from scratch — the dict-like Protocol in disguise. - Time-box: 20 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Write a generic function with
TypeVar(and 3.12+ syntax). - Pick between Protocol and ABC for a given API surface.
- Use dataclasses with
frozen=True, slots=Trueandfield(default_factory=...). - Implement
__iter__,__len__,__eq__,__hash__correctly. - Pick from numpy → numba → cython → C++ binding as performance escape hatches.
- Solve
design-hashmap— basic open addressing or chaining; the data-model contract.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.