mlsysim/paper/RELATED_WORK_GUIDE.md
A structured walkthrough of every cited paper, organized by function in the MLSysim narrative. Use this to verify each citation is justified and to spot gaps.
Each entry follows this format:
paper.tex where it appearsThese papers establish why MLSysim needs to exist.
sutton2019bitter — Rich Sutton, "The Bitter Lesson" (2019)dean2012large — Dean et al., "Large Scale Distributed Deep Networks" (NeurIPS 2012)shoeybi2019megatron — Shoeybi et al., "Megatron-LM" (2019)These papers define the modeling landscape MLSysim positions against.
won2023astrasim2 — Won et al., "ASTRA-sim 2.0" (ISPASS 2023)calculon2023 — Isaev et al., "Calculon" (SC 2023)binkert2011gem5 — Binkert et al., "The gem5 Simulator" (2011)wang2025simai — Wang et al., "SimAI" (NSDI 2025)Tools that operate at the operator/tile level — one abstraction below MLSysim.
parashar2019timeloop — Parashar et al., "Timeloop" (ISPASS 2019)wu2019accelergy — Wu et al., "Accelergy" (ICCAD 2019)zhang2024llmcompass — Zhang et al., "LLMCompass" (ISCA 2024)The tools most similar to MLSysim in spirit — analytical approaches trading fidelity for speed.
qi2017paleo — Qi et al., "PALEO" (ICLR 2017)jia2019flexflow — Jia et al., "FlexFlow" (MLSys 2019)yu2021habitat — Yu et al., "Habitat" (ATC 2021)agrawal2024vidur — Agrawal et al., "Vidur" (MLSys 2024)bambhaniya2024genz — Bambhaniya et al., "GenZ" (2024)zhong2024distserve — Zhong et al., "DistServe" (OSDI 2024)agrawal2024sarathi — Agrawal et al., "Sarathi-Serve" (OSDI 2024)yuan2024llmviewer — Yuan et al., "LLM Inference Unveiled" (2024)kim2023llmanalysis — Li, Cheng, "llm-analysis" (GitHub 2023)liang2025lumos — Liang et al., "Lumos" (MLSys 2025)deepseek2025v3 — DeepSeek-AI, "DeepSeek-V3" (ISCA 2025)faiz2024llmcarbon — Faiz et al., "LLMCarbon" (ICLR 2024)lottick2019codecarbon — Lottick et al., "CodeCarbon" (2019)wongpanich2025fleet — Wongpanich et al., "ML Productivity Goodput" (2025)hennessy2024architecture — Hennessy & Patterson, Computer Architecture 7th ed. (2024)patterson2014organization — Patterson & Hennessy, Computer Organization and Design 5th ed. (2014)cox2011xv6 — Cox et al., "xv6" (MIT 2011)tanenbaum2006minix — Tanenbaum & Woodhull, MINIX / OS Design and Implementation (2006)mlsysbook2025 — Reddi et al., Machine Learning Systems textbook (2025)Each wall cites the foundational paper for its core equation.
williams2009roofline — Williams et al., "Roofline Model" (CACM 2009)chowdhery2022palm — Chowdhery et al., "PaLM" (JMLR 2023)pope2023llm — Pope et al., "Efficiently Scaling Transformer Inference" (MLSys 2023)kwon2023efficient — Kwon et al., "PagedAttention / vLLM" (SOSP 2023)lie2022cerebras — Lie, "Cerebras Architecture Deep Dive" (Hot Chips 2022)dean2013tail — Dean & Barroso, "The Tail at Scale" (CACM 2013)mohan2021analyzing — Mohan et al., "Analyzing and Mitigating Data Stalls" (VLDB 2021)murray2021tf — Murray et al., "tf.data" (VLDB 2021)leiserson1985fat — Leiserson, "Fat-Trees" (IEEE Trans. Computers 1985)hoffmann2022chinchilla — Hoffmann et al., "Chinchilla" (NeurIPS 2022)snell2024scaling — Snell et al., "Scaling LLM Test-Time Compute" (ICLR 2025)han2016deep — Han et al., "Deep Compression" (ICLR 2016, Best Paper)gholami2021survey — Gholami et al., "Survey of Quantization Methods" (2021)narayanan2021efficient — Narayanan et al., "Efficient Large-Scale LM Training" (SC 2021)daly2006higher — Daly, "Higher Order Estimate of Optimal Checkpoint Interval" (2006)young1974first — Young, "First Order Approximation to Optimum Checkpoint Interval" (CACM 1974)little1961proof — Little, "A Proof for L = λW" (Operations Research 1961)barroso2018datacenter — Barroso et al., The Datacenter as a Computer 3rd ed. (2018)barroso2007case — Barroso & Hölzle, "The Case for Energy-Proportional Computing" (IEEE Computer 2007)patterson2021carbon — Patterson et al., "Carbon Emissions and Large Neural Network Training" (2021)eisenman2022checknrun — Eisenman et al., "Check-N-Run" (NSDI 2022)abadi2016deep — Abadi et al., "Deep Learning with Differential Privacy" (CCS 2016)dao2022flashattention — Dao et al., "FlashAttention" (NeurIPS 2022)zheng2024sglang — Zheng et al., "SGLang" (2024)leviathan2023fast — Leviathan et al., "Fast Inference via Speculative Decoding" (ICML 2023)patel2024splitwise — Patel et al., "Splitwise" (ISCA 2024)frantar2023gptq — Frantar et al., "GPTQ" (ICLR 2023)lin2024awq — Lin et al., "AWQ" (MLSys 2024)nvidia2023h100 — NVIDIA, "H100 Tensor Core GPU Datasheet" (2023)mlperf2020 — Mattson et al., "MLPerf" (IEEE Micro 2020)llama3team2024 — Llama Team @ Meta, "The Llama 3 Herd of Models" (2024)box1976science — Box, "Science and Statistics" (JASA 1976)stephenson1999mco — Stephenson et al., "Mars Climate Orbiter Mishap Investigation Board Phase I Report" (NASA 1999)tinytorch2025 — Reddi et al., "TinyTorch" (2025)shazeer2017outrageously — Shazeer et al., "Outrageously Large Neural Networks: MoE" (ICLR 2017)SparseTransformerWorkload.lower() uses active params for FLOPs but total params for memory.These 6 papers exist in references.bib but are never referenced in paper.tex.
kaplan2020scaling — Kaplan et al., "Scaling Laws for Neural Language Models" (2020)rajbhandari2020zero — Rajbhandari et al., "ZeRO" (SC 2020)rasley2020deepspeed — Rasley et al., "DeepSpeed" (KDD 2020)gupta2022act — Gupta et al., "ACT" (ISCA 2022)amodei2018ai — Amodei & Hernandez, "AI and Compute" (OpenAI 2018)jouppi2017datacenter — Jouppi et al., "TPU v1" (ISCA 2017)| Category | Count | Notes |
|---|---|---|
| Actively cited | 64 keys | All justified |
| Uncited in bib | 6 keys | 1 fix required (ZeRO), 5 optional |
| Most-cited paper | williams2009roofline | ~8 appearances — the theoretical backbone |
| Second most-cited | hennessy2024architecture | 6 appearances — the pedagogical backbone |
| Papers with dual roles | 7 | Equation source + validation anchor (PaLM, Chinchilla, etc.) |
| Unique venues represented | 25+ | ISCA, SOSP, NeurIPS, ICLR, MLSys, SC, OSDI, NSDI, etc. |
\citep{rajbhandari2020zero} at L617 where "ZeRO/FSDP" appears uncitedkaplan2020scaling as the precursor to Chinchilla in Wall 11amodei2018ai alongside Sutton in the Introductionrasley2020deepspeed, gupta2022act, jouppi2017datacenter if not adding citations