Improving Python CLIs with Pydantic and Dataclasses
I’ve been updating and upgrading some of my favorite CLI tools recently and writing a few parametric tools that benefit from fairly long configuration files. I was originally using vanilla dataclasses with some organic, homemade, grass-fed validation code until I caved and used Pydantic.
Pydantic builds on the dataclass foundation but makes clearly specifying and validating structured data input surprisingly pleasant. Say you’re using a JSON, TOML, YAML, SExpressions, XLSX or whatever input. At some point you’re going to turn that into a dictionary or a dataframe. With Pydantic you can described the whole nested structure as classes with classes as member and shove the entire dictionary into it. If the data is valid and the validation passes then you’ll end up with an instance of your schema that looks and feels like a dataclass.
Dynamic CLI Options
Whenever I start from scratch with a Python CLI, I almost always use the click
library.
It’s so nice I will also wrap other tools in a click
CLIs.
Take kicad-auto-silkscreen as an example.
We have a configuration dataclass SilkscreenConfig
which is passed to the main class AutoSilkscreen
.
I’m not using Pydantic here as it mostly will just add a extra dependency but the same process could be used with extra type checking.
Note: There are several CLI libraries which have direct dataclass and Pydantic support built in:
Typer is both popular and maintained by FastAPI so I would look there first.
The Dataclass:
@dataclass
class SilkscreenConfig:
max_allowed_distance: float = 5.0 # mm
method: str = "anneal"
step_size: float = 0.1 # mm
only_process_selection: bool = False
ignore_vias: bool = True
deflate_factor: float = 1.0
maxiter: int = 100
debug: bool = True
The CLI is intended to be lightweight so adding each field to the CLI, SilkscreenConfig
, and
AutoSilkscreen
seems a bit repetitive. Additionally you would need to handle the default values
in two places.
Instead, we can dynamically generate the CLI options based on the fields in the dataclass
which keeps everything in sync.
We can also use Pydantic here but click will already do the type casting for us.
Dynamic CLI Generation:
from dataclasses import fields
import click
class DynamicSilkscreenCommand(click.Command):
def __init__(self, *args, **kwargs):
# Dynamically add options based on the dataclass fields
for field in fields(SilkscreenConfig):
option_name = f'--{field.name.replace("_", "-")}'
field_type = field.type
default_value = getattr(SilkscreenConfig, field.name)
# Handle boolean flags
if field_type == bool:
option = click.Option(param_decls=[option_name], is_flag=True, default=default_value)
else:
# For other types, use the appropriate click type (str, int, float, etc.)
option = click.Option(param_decls=[option_name], default=default_value, type=field_type)
kwargs.setdefault('params', []).append(option)
super().__init__(*args, **kwargs)
@click.command(cls=DynamicSilkscreenCommand)
@click.option("--board", type=str, required=True)
@click.option("--out", type=str, required=True)
def main(board, out, **config_options):
# Instantiate the configuration dataclass
config = SilkscreenConfig(**config_options)
click.echo(f"Config: {config}")
click.echo(f"Board: {board}")
click.echo(f"Output file: {out}")
Explanation:
- The
DynamicSilkscreenCommand
class dynamically adds aclick.Option
for each field in theSilkscreenConfig
dataclass. - Each field’s name and type are read from the dataclass, and
click.Option
is created accordingly. This means if you add or remove fields fromSilkscreenConfig
, the CLI will adjust automatically. - The default values from the dataclass are used unless overridden by the user via the command line.
Reading Config Files With Defaults
Another useful pattern is handling configuration files, such as TOML files, and automatically using default values defined in the dataclass. I’m a fan of TOML files for configuration files. They have inferred types and take care of the first layer of validation in the standard library.
In LC120LaserNoise, I needed to pass configuration fields for the photoreceiver, laser, oscilloscope, and measurement. If there is an incompatible type in the config file then an error will be raised immediately. The rest of the code can then work with a type checked and validated object instead of a nested dictionary without the need of extra validation and data processing code.
The TOML file:
[laser]
baudrate = 115200
[photoreceiver]
transimpedance=1e6
bandwidth=750e3
[oscilloscope]
channel=1
timescale="100ns"
scale="1V"
[measurement]
path = "./data"
name = "demo"
continue_on_restart = true
repetitions = 1
ntemp = 10
ncurrent = 100
temp_sleep = 0
current_sleep = 0
temp_min = 0
temp_max = 35
temp_step = 1
current_min = 1e-3
current_max = 100e-3
current_step = 1e-3
measurement_sleep = 10e-3
The Code:
class PhotoreceiverConfig(BaseModel):
bandwidth: float
transimpedance: float
class LaserConfig(BaseModel):
baudrate: int = 115200
class OscilloscopeConfig(BaseModel):
channel: int = 1
timescale: str = "100us" # s/div
attenuation: float = 1
scale: str = "1V" # V/DIV
coupling: str = "D1M"
offset: str = "0V"
averages: int = 1
npoints: int = 14000
class MeasurementConfig(BaseModel):
path: Path
name: str = "untitled"
continue_on_restart: bool = False
repetitions: int = 1
ntemp: int = 10
ncurrent: int = 100
temp_sleep: float = 1
current_sleep: float = 0.05
temp_min: float = 0
temp_max: float = 35
temp_step: float = 35
current_min: float = 1e-3
current_max: float = 100e-3
current_step: float = 1e-3
measurement_sleep: float = 10e-3 # Time in between samples
@property
def run_path(self):
return (Path(self.path) / self.name).absolute()
class Config(BaseModel):
laser: LaserConfig
oscilloscope: OscilloscopeConfig
measurement: MeasurementConfig
photoreceiver: PhotoreceiverConfig
def load_config(path: str) -> Config:
with open(path, 'rb') as f:
raw = tomllib.load(f)
return Config(**raw)
Explanation:
- The TOML configuration is loaded using Python’s built-in tomllib. Since Pydantic only sees the dictionary and will do the type casting automatically, adding other input formats is trivial.
- The Config class (and its sub-classes) are used to map the configuration data.
- If any required field is missing in the TOML file, it will raise an exception, and if a field is missing from a sub-dataclass (e.g., LaserConfig), the default values will be used automatically.
This approach avoids the need for redundant configuration validation code and makes it easy to add new fields to the configuration by simply updating the dataclass. It’s a simple and transparent method that avoids duplication, better than messing around with straight nested dictionaries.
Any patterns or applications you recommend?