Improving Python CLIs with Pydantic and Dataclasses

Effortless CLI options and config management using Pydantic and Dataclasses.

I’ve been updating and upgrading some of my favorite CLI tools recently and writing a few parametric tools that benefit from fairly long configuration files. I was originally using vanilla dataclasses with some organic, homemade, grass-fed validation code until I caved and used Pydantic.

Pydantic builds on the dataclass foundation but makes clearly specifying and validating structured data input surprisingly pleasant. Say you’re using a JSON, TOML, YAML, SExpressions, XLSX or whatever input. At some point you’re going to turn that into a dictionary or a dataframe. With Pydantic you can described the whole nested structure as classes with classes as member and shove the entire dictionary into it. If the data is valid and the validation passes then you’ll end up with an instance of your schema that looks and feels like a dataclass.

Dynamic CLI Options

Whenever I start from scratch with a Python CLI, I almost always use the click library. It’s so nice I will also wrap other tools in a click CLIs. Take kicad-auto-silkscreen as an example. We have a configuration dataclass SilkscreenConfig which is passed to the main class AutoSilkscreen.

I’m not using Pydantic here as it mostly will just add a extra dependency but the same process could be used with extra type checking.

Note: There are several CLI libraries which have direct dataclass and Pydantic support built in:

Typer is both popular and maintained by FastAPI so I would look there first.

The Dataclass:

@dataclass
class SilkscreenConfig:
    max_allowed_distance: float = 5.0   # mm
    method: str = "anneal"
    step_size: float = 0.1              # mm
    only_process_selection: bool = False
    ignore_vias: bool = True
    deflate_factor: float = 1.0
    maxiter: int = 100
    debug: bool = True

The CLI is intended to be lightweight so adding each field to the CLI, SilkscreenConfig, and AutoSilkscreen seems a bit repetitive. Additionally you would need to handle the default values in two places. Instead, we can dynamically generate the CLI options based on the fields in the dataclass which keeps everything in sync. We can also use Pydantic here but click will already do the type casting for us.

Dynamic CLI Generation:

from dataclasses import fields
import click

class DynamicSilkscreenCommand(click.Command):
    def __init__(self, *args, **kwargs):
        # Dynamically add options based on the dataclass fields
        for field in fields(SilkscreenConfig):
            option_name = f'--{field.name.replace("_", "-")}'
            field_type = field.type
            default_value = getattr(SilkscreenConfig, field.name)

            # Handle boolean flags
            if field_type == bool:
                option = click.Option(param_decls=[option_name], is_flag=True, default=default_value)
            else:
                # For other types, use the appropriate click type (str, int, float, etc.)
                option = click.Option(param_decls=[option_name], default=default_value, type=field_type)

            kwargs.setdefault('params', []).append(option)
        super().__init__(*args, **kwargs)


@click.command(cls=DynamicSilkscreenCommand)
@click.option("--board", type=str, required=True)
@click.option("--out", type=str, required=True)
def main(board, out, **config_options):
    # Instantiate the configuration dataclass
    config = SilkscreenConfig(**config_options)
    click.echo(f"Config: {config}")
    click.echo(f"Board: {board}")
    click.echo(f"Output file: {out}")

Explanation:

  • The DynamicSilkscreenCommand class dynamically adds a click.Option for each field in the SilkscreenConfig dataclass.
  • Each field’s name and type are read from the dataclass, and click.Option is created accordingly. This means if you add or remove fields from SilkscreenConfig, the CLI will adjust automatically.
  • The default values from the dataclass are used unless overridden by the user via the command line.

Reading Config Files With Defaults

Another useful pattern is handling configuration files, such as TOML files, and automatically using default values defined in the dataclass. I’m a fan of TOML files for configuration files. They have inferred types and take care of the first layer of validation in the standard library.

In LC120LaserNoise, I needed to pass configuration fields for the photoreceiver, laser, oscilloscope, and measurement. If there is an incompatible type in the config file then an error will be raised immediately. The rest of the code can then work with a type checked and validated object instead of a nested dictionary without the need of extra validation and data processing code.

The TOML file:

[laser]
baudrate = 115200

[photoreceiver]
transimpedance=1e6
bandwidth=750e3

[oscilloscope]
channel=1
timescale="100ns"
scale="1V"

[measurement]
path = "./data"
name = "demo"
continue_on_restart = true
repetitions = 1
ntemp = 10
ncurrent = 100
temp_sleep = 0
current_sleep = 0
temp_min = 0
temp_max = 35
temp_step = 1
current_min = 1e-3
current_max = 100e-3
current_step = 1e-3
measurement_sleep = 10e-3

The Code:

class PhotoreceiverConfig(BaseModel):
    bandwidth: float
    transimpedance: float

class LaserConfig(BaseModel):
    baudrate: int = 115200

class OscilloscopeConfig(BaseModel):
    channel: int = 1
    timescale: str = "100us"  #  s/div
    attenuation: float = 1
    scale: str = "1V"  #  V/DIV
    coupling: str = "D1M"
    offset: str = "0V"
    averages: int = 1
    npoints: int = 14000

class MeasurementConfig(BaseModel):
    path: Path
    name: str = "untitled"
    continue_on_restart: bool = False
    repetitions: int = 1
    ntemp: int = 10
    ncurrent: int = 100
    temp_sleep: float = 1
    current_sleep: float = 0.05
    temp_min: float = 0
    temp_max: float = 35
    temp_step: float = 35
    current_min: float = 1e-3
    current_max: float = 100e-3
    current_step: float = 1e-3
    measurement_sleep: float = 10e-3 # Time in between samples

    @property
    def run_path(self):
        return (Path(self.path) / self.name).absolute()

class Config(BaseModel):
    laser: LaserConfig
    oscilloscope: OscilloscopeConfig
    measurement: MeasurementConfig
    photoreceiver: PhotoreceiverConfig

def load_config(path: str) -> Config:
    with open(path, 'rb') as f:
        raw = tomllib.load(f)
    return Config(**raw)

Explanation:

  • The TOML configuration is loaded using Python’s built-in tomllib. Since Pydantic only sees the dictionary and will do the type casting automatically, adding other input formats is trivial.
  • The Config class (and its sub-classes) are used to map the configuration data.
  • If any required field is missing in the TOML file, it will raise an exception, and if a field is missing from a sub-dataclass (e.g., LaserConfig), the default values will be used automatically.

This approach avoids the need for redundant configuration validation code and makes it easy to add new fields to the configuration by simply updating the dataclass. It’s a simple and transparent method that avoids duplication, better than messing around with straight nested dictionaries.


Any patterns or applications you recommend?