mlsysim/cli/DESIGN.md
This document establishes the unbreakable design rules for the mlsysim CLI.
If this project succeeds, it will not just be a textbook companion, but the industry standard for AI infrastructure planning (akin to Terraform or kubectl). To achieve this, the CLI must be built for automation, extensibility, and AI agents.
All code merged into mlsysim/cli/ must strictly adhere to the following five rules.
The core physics engine (mlsysim.core.solver) must never receive bad data.
EvalNodeSchema) before any core mathematical logic is invoked.stdout vs stderr Rule)A modern CLI must serve two masters equally well: the human at the keyboard, and the machine/agent in the pipeline.
stdout) is exclusively for the final payload. Standard Error (stderr) is for logs, warnings, errors, and progress bars.mlsysim --output json eval Llama3_8B H100 > result.json, they must end up with a perfectly valid JSON file. If a progress spinner ([⠋] Calculating...) leaks into stdout, it corrupts the JSON and breaks CI/CD pipelines.In the agentic era, scripts shouldn't have to parse text to know what went wrong.
exceptions.py.
Exit 0: Success / Feasible. The system runs and meets all SLAs.Exit 1: Bad Input. Syntax Error, Typo, Validation Failure.Exit 2: Physics Violation (Infeasible). The model OOMs, or the pipeline is completely starved. (A hardware limitation).Exit 3: SLA/Constraint Violation. The model fits, but P99 latency > 50ms, or TCO > Budget. (A business limitation).Exit 2 tells the developer "change your architecture," while Exit 3 tells them "ask for more budget."The CLI UX must accurately reflect the architectural rigor of the underlying engine.
mlsysim eval ... strictly calls BaseModel components (Physics Engine). It can take direct flags or a full mlsys.yaml specification.mlsysim solve ... strictly calls BaseSolver components (Math Engine).mlsysim optimize ... strictly calls BaseOptimizer components (Engineering Engine).The CLI should not do any math. It only formats the math.
mlsysim.core.solver modules return strictly typed Pydantic objects. The CLI's only job (via renderers.py) is to translate that object into a rich.Table (for humans) or a JSON string (for machines/agents).The ultimate goal of this CLI is to support Infrastructure as Code (IaC).
The eval command handles both quick terminal checks and full infrastructure evaluation. When you pass mlsysim eval my_cluster.yaml, it acts like a compiler for infrastructure, taking a declarative YAML specification of Demand, Supply, and Ops Context, and verifying it against all 22 system constraints simultaneously.