Back to Cs249r Book

Topic Map — ML Systems Interview Playbook

interviews/TOPIC_MAP.md

latest8.2 KB
Original Source

Topic Map — ML Systems Interview Playbook

This document is the master plan for the playbook. Every question we write traces back to a competency area defined here. When expanding the playbook, consult this map first — don't add questions that don't fill a gap.


The Framework

An ML systems interviewer evaluates 10 core competency areas. These are universal — they apply regardless of whether you deploy to a data center or a microcontroller. What changes across deployment tracks is how each competency manifests: the hardware, the constraints, the failure modes.

Each competency is tested at 4 mastery levels (L3 → L6+), reflecting increasing cognitive depth:

LevelCognitive SkillScopeIndustry Mapping
L3Recall & DefineTask-levelJunior / New Grad
L4Apply & IdentifyComponent-levelMid-level (2–4 yrs)
L5Analyze & PredictSystem-levelSenior (5–8 yrs)
L6+Synthesize & DeriveArchitecture-levelStaff / Principal (8+ yrs)

The 10 Competency Areas

1. Compute Analysis

What it tests: Can you reason about whether a workload is compute-bound or memory-bound? Can you use the roofline model to diagnose performance?

TrackManifestation
☁️ CloudGPU roofline, Tensor Cores, HBM bandwidth, FP16/FP8 peak TFLOPS
🤖 EdgeInteger roofline, TOPS/W, accelerator comparison under power caps
📱 MobileNPU delegation, subgraph partitioning, heterogeneous scheduling (CPU/GPU/NPU)
🔬 TinyMLMFLOPS on Cortex-M, no FPU, CMSIS-NN SIMD utilization

Textbook grounding: Hardware Acceleration, Benchmarking, Compute Infrastructure

2. Memory Systems

What it tests: Can you account for where every byte lives and moves? Can you diagnose memory bottlenecks?

TrackManifestation
☁️ CloudVRAM accounting (weights + optimizer + activations + KV-cache), HBM tiers, gradient checkpointing
🤖 EdgeDRAM budgets shared with OS/sensors, DMA transfers, memory-mapped I/O
📱 MobileShared RAM with OS and apps, no dedicated VRAM, memory-mapped weights, app lifecycle eviction
🔬 TinyMLSRAM partitioning, flat tensor arena, flash vs SRAM, operator scheduling for peak RAM

Textbook grounding: Neural Computation, Model Training, Hardware Acceleration, Data Systems

3. Numerical Representation

What it tests: Do you understand precision formats, quantization, and their system-level effects?

TrackManifestation
☁️ CloudFP16/BF16 mixed precision training, FP8 inference, loss scaling, underflow
🤖 EdgeINT8 quantization-aware training, calibration strategies, per-channel vs per-tensor
📱 MobileFloat16 on NPU, quantized CPU fallback, silent accuracy loss from format conversion
🔬 TinyMLINT8/INT4 only, zero-point arithmetic, requantization between layers, no floating point

Textbook grounding: Model Compression, Performance Engineering

4. Model Architecture → System Cost

What it tests: Can you map architecture choices to resource consumption? Can you estimate cost before training?

TrackManifestation
☁️ CloudTransformer scaling laws, MoE routing overhead, attention complexity (O(n²) vs linear)
🤖 EdgeCNN vs Transformer for real-time vision, model size vs frame budget trade-off
📱 MobileMobileNet/EfficientNet design, on-device LLM feasibility, operator support constraints
🔬 TinyMLDepthwise separable convolutions, inverted residuals, NAS for MCUs (MCUNet), operator support

Textbook grounding: Network Architectures, Model Compression, Neural Computation

5. Latency & Throughput

What it tests: Can you decompose end-to-end latency and identify bottlenecks? Do you understand queueing theory?

TrackManifestation
☁️ CloudTTFT/TPOT, tail latency, continuous batching, queueing theory (Little's Law), Amdahl's law
🤖 EdgeWorst-case execution time (WCET), frame deadlines (33ms at 30 FPS), pipeline overlap
📱 MobileUI jank budgets (16ms at 60 FPS), ANR timeouts, async inference, interaction latency
🔬 TinyMLMicrosecond inference, duty cycle timing, interrupt-driven pipelines

Textbook grounding: Model Serving, Benchmarking, Inference at Scale

6. Power & Thermal

What it tests: Can you reason about energy as a first-class constraint, not an afterthought?

TrackManifestation
☁️ Cloud700W–1000W TDP per chip, PUE, liquid cooling, TCO dominated by electricity
🤖 Edge15–75W thermal envelope, DVFS P-states, sustained vs burst performance
📱 Mobile3–5W total device power, thermal throttling, Jevons paradox in battery drain
🔬 TinyMLMilliwatts, energy harvesting, duty cycling, active vs sleep power budgets

Textbook grounding: Sustainable AI, Compute Infrastructure, Hardware Acceleration

7. Model Optimization

What it tests: Can you make a model smaller/faster without destroying accuracy? Do you know when each technique applies?

TrackManifestation
☁️ CloudKnowledge distillation, speculative decoding, MoE, FlashAttention, kernel fusion
🤖 EdgeQAT, structured pruning for accelerator alignment, TensorRT optimization
📱 MobileDelegation-aware pruning, Core ML / TFLite optimization, operator fusion
🔬 TinyMLMixed-precision quantization, operator scheduling for peak RAM, binary/ternary networks

Textbook grounding: Model Compression, Performance Engineering, Data Selection

8. Deployment & Serving

What it tests: Can you get a model into production and keep it running? Do you understand the deployment lifecycle?

TrackManifestation
☁️ CloudKubernetes autoscaling, A/B rollout, canary deploys, cold start optimization, model registries
🤖 EdgeOTA updates, A/B partitioned firmware, functional safety certification, rollback mechanisms
📱 MobileApp store delivery, on-demand model download, tiered models by device capability, model versioning
🔬 TinyMLFlash programming, FOTA (firmware over-the-air), model fits in flash, bootloader constraints

Textbook grounding: ML Operations, Model Serving, Fleet Orchestration, Edge Intelligence

9. Monitoring & Reliability

What it tests: Can you detect when a system is failing silently? Can you design for graceful degradation?

TrackManifestation
☁️ CloudData drift detection (KL divergence, PSI), training-serving skew, MTBF/MTTR, straggler detection
🤖 EdgeDegradation ladders, fail-safe vs fail-operational, sensor fusion validation, watchdog timers
📱 MobileCrash reporting, silent accuracy loss, federated analytics, thermal state monitoring
🔬 TinyMLWatchdog timers, hard real-time guarantees, self-test routines, anomaly detection on-device

Textbook grounding: ML Operations, Fault Tolerance, Robust AI, Operations at Scale

10. Security, Privacy & Fairness

What it tests: Can you reason about trust boundaries, data protection, and societal impact?

TrackManifestation
☁️ CloudPrompt injection, DP-SGD, membership inference, model theft, bias amplification, subgroup evaluation
🤖 EdgePhysical tampering, adversarial patches, safety certification (ISO 26262), supply chain integrity
📱 MobileOn-device differential privacy, federated learning, user data isolation, app sandboxing
🔬 TinyMLSide-channel attacks, model extraction from flash, physical access threats, resource-constrained crypto

Textbook grounding: Security & Privacy, Responsible Engineering, Robust AI


Coverage Matrix — Current State

This matrix shows the coverage for each competency × track. Following the March 2026 expansion, all tracks have been fully fleshed out with 150+ questions each.

☁️ Cloud Track (217 questions)

Status: Fully fleshed out across 6 rounds + visual debugging.

🤖 Edge Track (189 questions)

Status: Fully fleshed out across 5 rounds.

📱 Mobile Track (174 questions)

Status: Fully fleshed out across 5 rounds.

🔬 TinyML Track (168 questions)

Status: Fully fleshed out across 5 rounds.