tools/sysdesign_platform/README.md
If we are building the "LeetCode for ML Systems," the design fundamentally shifts from a reading platform (flashcards) to a validation platform (simulated execution).
LeetCode works because you write code, click "Submit", and get an objective PASS or FAIL based on execution time and edge cases. You cannot do that with system design using simple text boxes. We must evaluate architecture, not algorithms.
Here is the blueprint for "IronLaw" (or whatever we name it)—the interactive ML Systems interview platform.
Instead of writing a Python function to reverse a string, the user configures a system to meet an SLA (Service Level Agreement).
Task: Serve a 70B parameter LLM to 5,000 concurrent users. Constraints:
- Time-To-First-Token (TTFT) < 200ms
- Time-Per-Output-Token (TPOT) < 50ms
- Budget: < $50,000 / month
The user doesn't type text. They use a Visual Architecture Builder (or a clean YAML config editor for power users). They drag and drop:
When the user clicks "Submit Architecture", we do not use an LLM to guess if they are right.
We pass their JSON configuration directly into our mlsysim engine.
The engine calculates the deterministic physics:
❌ FAILED: SLA Violation
Your architecture costs $32,000/month (PASS), but your TPOT is 120ms (FAIL).
Diagnostic: You are memory-bandwidth bound. Your 8x A100s (2.0 TB/s each) cannot stream the 70B INT8 weights fast enough at batch size 32.
Hint: You have spare compute. Have you considered trading compute for memory bandwidth using Speculative Decoding?
To build the addictive loop of LeetCode, we need status and progression.
mlsysim), their Elo rating increases.On LeetCode, you want your algorithm to be in the "Top 1% of execution time." On our platform, architecture is a multi-objective optimization problem. When a user passes a challenge, they see a Pareto Frontier plot.
sys_arch_99's solution which achieved the exact same latency for only $18,000/month."LeetCode's most valuable feature is the Discussion tab where people explain the optimal solution. We can automate the perfect mentor.
If a user is stuck, they click "Ask the Architect".
book/quarto/).We already have 80% of the pieces.
mlsysim): The engine takes the JSON representation of the user's architecture, runs the exact physics formulas you already wrote in the simulator, and returns the SLA metrics (Latency, Memory, Cost).To build the "LeetCode for ML Systems," we must stop grading with text and start grading with physics. The core loop is: Given a workload and an SLA, design an architecture that satisfies mlsysim.