Some Useful Python Dataclass Patterns
I’ve been updating and upgrading some of my favorite CLI tools recently partially with the help of dataclasses. Most of these tools need configuration options and dataclasses offer a slick syntax to manage them. Here are a few patterns that I’ve come across and found super useful.
Dynamic CLI Options
Whenever I start from scratch with a Python CLI, I almost always use the click
library.
It’s so nice I will also wrap other tools in a click
CLIs.
Take kicad-auto-silkscreen as an example.
We have a configuration dataclass SilkscreenConfig
which is passed to the main class AutoSilkscreen
.
The Dataclass:
@dataclass
class SilkscreenConfig:
max_allowed_distance: float = 5.0 # mm
method: str = "anneal"
step_size: float = 0.1 # mm
only_process_selection: bool = False
ignore_vias: bool = True
deflate_factor: float = 1.0
maxiter: int = 100
debug: bool = True
The CLI is intended to be lightweight so adding each field to the CLI, SilkscreenConfig
, and
AutoSilkscreen
seems a bit repetitive. Additonally you would need to handle the default values
in two places.
Instead, we can dynamically generate the CLI options based on the fields in the dataclass
which kepes everything in sync.
Dynamic CLI Generation:
from dataclasses import fields
import click
class DynamicSilkscreenCommand(click.Command):
def __init__(self, *args, **kwargs):
# Dynamically add options based on the dataclass fields
for field in fields(SilkscreenConfig):
option_name = f'--{field.name.replace("_", "-")}'
field_type = field.type
default_value = getattr(SilkscreenConfig, field.name)
# Handle boolean flags
if field_type == bool:
option = click.Option(param_decls=[option_name], is_flag=True, default=default_value)
else:
# For other types, use the appropriate click type (str, int, float, etc.)
option = click.Option(param_decls=[option_name], default=default_value, type=field_type)
kwargs.setdefault('params', []).append(option)
super().__init__(*args, **kwargs)
@click.command(cls=DynamicSilkscreenCommand)
@click.option("--board", type=str, required=True)
@click.option("--out", type=str, required=True)
def main(board, out, **config_options):
# Instantiate the configuration dataclass
config = SilkscreenConfig(**config_options)
click.echo(f"Config: {config}")
click.echo(f"Board: {board}")
click.echo(f"Output file: {out}")
Explanation:
- The
DynamicSilkscreenCommand
class dynamically adds aclick.Option
for each field in theSilkscreenConfig
dataclass. - Each field’s name and type are read from the dataclass, and
click.Option
is created accordingly. This means if you add or remove fields fromSilkscreenConfig
, the CLI will adjust automatically. - The default values from the dataclass are used unless overridden by the user via the command line.
Reading Config Files With Defaults
Another useful pattern is handling configuration files, such as TOML files, and automatically using default values defined in the dataclass. I’m a fan of TOML files for configuration files. They have inferred types and take care of the first layer of validation in the standard library.
In LC120LaserNoise, I needed to pass configuration fields for the photoreceiver, laser, oscilloscope, and measurement. The goal was to load a TOML file while using the default values from the dataclasses and only throwing an exception if a required field was missing.
The TOML file:
[photoreceiver]
transimpedance=1e6
bandwidth=750e3
[laser]
baudrate = 115200
[oscilloscope]
channel=1
timescale="100ns"
scale="1V"
[measurement]
path = "./data"
name = "demo"
continue_on_restart = true
repetitions = 1
ntemp = 10
ncurrent = 100
temp_sleep = 0
current_sleep = 0
temp_min = 0
temp_max = 35
temp_step = 1
current_min = 1e-3
current_max = 100e-3
current_step = 1e-3
measurement_sleep = 10e-3
The Code:
@dataclass
class PhotorecieverConfig:
bandwidth: float
transimpedance: float
@dataclass
class LaserConfig:
baudrate: int = 115200
@dataclass
class OscilloscopeConfig:
channel: int = 1
timescale: str = "100us" # s/div
attenuation: float = 1
scale: float = 1 # V/DIV
coupling: str = "D1M"
offset: str = "0"
averages: int = 1
npoints: int = 14000
@dataclass
class MeasurementConfig:
path: str
name: str = "untitled"
continue_on_restart: bool = False
repetitions: int = 1
ntemp: int = 10
ncurrent: int = 100
temp_sleep: float = 1
current_sleep: float = 0.05
temp_min: float = 0
temp_max: float = 35
temp_step: float = 35
current_min: float = 1e-3
current_max: float = 100e-3
current_step: float = 1e-3
measurement_sleep: float = 10e-3 # Time inbetween samples
@property
def run_path(self):
return Path(self.path) / self.name
@dataclass
class Config:
laser: LaserConfig
oscilloscope: OscilloscopeConfig
measurement: MeasurementConfig
photoreceiver: PhotorecieverConfig
def load_config(path: str) -> Config:
with open(path, 'rb') as f:
raw = tomllib.load(f)
return Config(
laser=LaserConfig(**raw["laser"]),
oscilloscope=OscilloscopeConfig(**raw['oscilloscope']),
measurement=MeasurementConfig(**raw['measurement']),
photoreceiver=PhotorecieverConfig(**raw['photoreceiver'])
)
Explanation:
- The TOML configuration is loaded using Python’s built-in tomllib.
- The Config class (and its sub-classes) are used to map the configuration data.
- If any required field is missing in the TOML file, it will raise an exception, and if a field is missing from a sub-dataclass (e.g., LaserConfig), the default values will be used automatically.
This approach avoids the need for redundant configuration validation code and makes it easy to add new fields to the configuration by simply updating the dataclass. It’s a simple and transparent method that avoids duplication, better than messing around with straight nested dictionaries.
Any patterns you recommend?