Some Useful Python Dataclass Patterns

Effortless CLI options and config management using Python dataclasses.

I’ve been updating and upgrading some of my favorite CLI tools recently partially with the help of dataclasses. Most of these tools need configuration options and dataclasses offer a slick syntax to manage them. Here are a few patterns that I’ve come across and found super useful.

Dynamic CLI Options

Whenever I start from scratch with a Python CLI, I almost always use the click library. It’s so nice I will also wrap other tools in a click CLIs. Take kicad-auto-silkscreen as an example. We have a configuration dataclass SilkscreenConfig which is passed to the main class AutoSilkscreen.

The Dataclass:

@dataclass
class SilkscreenConfig:
    max_allowed_distance: float = 5.0   # mm
    method: str = "anneal"
    step_size: float = 0.1              # mm
    only_process_selection: bool = False
    ignore_vias: bool = True
    deflate_factor: float = 1.0
    maxiter: int = 100
    debug: bool = True

The CLI is intended to be lightweight so adding each field to the CLI, SilkscreenConfig, and AutoSilkscreen seems a bit repetitive. Additonally you would need to handle the default values in two places. Instead, we can dynamically generate the CLI options based on the fields in the dataclass which kepes everything in sync.

Dynamic CLI Generation:

from dataclasses import fields
import click

class DynamicSilkscreenCommand(click.Command):
    def __init__(self, *args, **kwargs):
        # Dynamically add options based on the dataclass fields
        for field in fields(SilkscreenConfig):
            option_name = f'--{field.name.replace("_", "-")}'
            field_type = field.type
            default_value = getattr(SilkscreenConfig, field.name)

            # Handle boolean flags
            if field_type == bool:
                option = click.Option(param_decls=[option_name], is_flag=True, default=default_value)
            else:
                # For other types, use the appropriate click type (str, int, float, etc.)
                option = click.Option(param_decls=[option_name], default=default_value, type=field_type)

            kwargs.setdefault('params', []).append(option)
        super().__init__(*args, **kwargs)


@click.command(cls=DynamicSilkscreenCommand)
@click.option("--board", type=str, required=True)
@click.option("--out", type=str, required=True)
def main(board, out, **config_options):
    # Instantiate the configuration dataclass
    config = SilkscreenConfig(**config_options)
    click.echo(f"Config: {config}")
    click.echo(f"Board: {board}")
    click.echo(f"Output file: {out}")

Explanation:

  • The DynamicSilkscreenCommand class dynamically adds a click.Option for each field in the SilkscreenConfig dataclass.
  • Each field’s name and type are read from the dataclass, and click.Option is created accordingly. This means if you add or remove fields from SilkscreenConfig, the CLI will adjust automatically.
  • The default values from the dataclass are used unless overridden by the user via the command line.

Reading Config Files With Defaults

Another useful pattern is handling configuration files, such as TOML files, and automatically using default values defined in the dataclass. I’m a fan of TOML files for configuration files. They have inferred types and take care of the first layer of validation in the standard library.

In LC120LaserNoise, I needed to pass configuration fields for the photoreceiver, laser, oscilloscope, and measurement. The goal was to load a TOML file while using the default values from the dataclasses and only throwing an exception if a required field was missing.

The TOML file:

[photoreceiver]
transimpedance=1e6
bandwidth=750e3

[laser]
baudrate = 115200

[oscilloscope]
channel=1
timescale="100ns"
scale="1V"

[measurement]
path = "./data"
name = "demo"
continue_on_restart = true
repetitions = 1
ntemp = 10
ncurrent = 100
temp_sleep = 0
current_sleep = 0
temp_min = 0
temp_max = 35
temp_step = 1
current_min = 1e-3
current_max = 100e-3
current_step = 1e-3
measurement_sleep = 10e-3

The Code:

@dataclass
class PhotorecieverConfig:
    bandwidth: float
    transimpedance: float

@dataclass
class LaserConfig:
    baudrate: int = 115200

@dataclass
class OscilloscopeConfig:
    channel: int = 1
    timescale: str = "100us"  #  s/div
    attenuation: float = 1
    scale: float = 1  #  V/DIV
    coupling: str = "D1M"
    offset: str = "0"
    averages: int = 1
    npoints: int = 14000

@dataclass
class MeasurementConfig:
    path: str
    name: str = "untitled"
    continue_on_restart: bool = False
    repetitions: int = 1
    ntemp: int = 10
    ncurrent: int = 100
    temp_sleep: float = 1
    current_sleep: float = 0.05
    temp_min: float = 0
    temp_max: float = 35
    temp_step: float = 35
    current_min: float = 1e-3
    current_max: float = 100e-3
    current_step: float = 1e-3
    measurement_sleep: float = 10e-3 # Time inbetween samples

    @property
    def run_path(self):
        return Path(self.path) / self.name

@dataclass
class Config:
    laser: LaserConfig
    oscilloscope: OscilloscopeConfig
    measurement: MeasurementConfig
    photoreceiver: PhotorecieverConfig

def load_config(path: str) -> Config:
    with open(path, 'rb') as f:
        raw = tomllib.load(f)

    return Config(
        laser=LaserConfig(**raw["laser"]),
        oscilloscope=OscilloscopeConfig(**raw['oscilloscope']),
        measurement=MeasurementConfig(**raw['measurement']),
        photoreceiver=PhotorecieverConfig(**raw['photoreceiver'])
    )

Explanation:

  • The TOML configuration is loaded using Python’s built-in tomllib.
  • The Config class (and its sub-classes) are used to map the configuration data.
  • If any required field is missing in the TOML file, it will raise an exception, and if a field is missing from a sub-dataclass (e.g., LaserConfig), the default values will be used automatically.

This approach avoids the need for redundant configuration validation code and makes it easy to add new fields to the configuration by simply updating the dataclass. It’s a simple and transparent method that avoids duplication, better than messing around with straight nested dictionaries.


Any patterns you recommend?