tinytorch/milestones/06_2018_mlperf/README.md
As ML models grew larger and deployment became critical, the community needed systematic optimization methodologies. MLCommons' MLPerf (2018) established standardized benchmarking and optimization workflows, shifting the focus from "can we build it?" to "can we deploy it efficiently?"
This milestone teaches production optimization - the systematic process of profiling, compressing, and accelerating models for real-world deployment.
A complete MLPerf-style optimization pipeline that takes YOUR networks from previous milestones and makes them production-ready!
This milestone has two scripts, each covering different optimization techniques:
Purpose: Optimize static models (MLP, CNN)
Uses YOUR implementations:
Networks from:
Purpose: Speed up Transformer generation
Uses YOUR implementations:
Networks from:
# Optimize MLP/CNN (profiling + quantization + pruning)
python milestones/06_2018_mlperf/01_optimization_olympics.py
# Speed up Transformer generation (KV caching)
python milestones/06_2018_mlperf/02_generation_speedup.py
Or via tito:
tito milestone run 06
Unlike earlier milestones where you "build and run," optimization requires:
This is ML systems engineering - the skill that ships products!