interviews/tinyml/README.md
The TinyML track covers ML systems running on microcontrollers โ where the entire model, runtime, and inference engine must fit in kilobytes of SRAM, execute in microseconds, and run on milliwatts of power.
In the cloud, you optimize for throughput. On mobile, you optimize for battery. In TinyML, you optimize for existence โ can the model even fit? There is no operating system, no virtual memory, no dynamic allocation. The entire inference pipeline โ weights, activations, scratch buffers, and application code โ must coexist in a flat memory space measured in kilobytes. Every byte is a design decision.
These are the areas where TinyML-specific interview questions would be most valuable. Each maps to real interview scenarios at companies like Arduino, Edge Impulse, Qualcomm (for always-on sensing), or embedded AI teams at larger companies.
<table> <thead> <tr> <th width="22%">Topic</th> <th width="28%">What TinyML interviews test</th> <th width="50%">Example scenario</th> </tr> </thead> <tbody> <tr> <td><b>Memory layout</b></td> <td>SRAM partitioning, activation reuse, operator scheduling for peak RAM</td> <td>"Your model needs 300 KB peak RAM but you only have 256 KB SRAM. How do you fit it without changing the model?"</td> </tr> <tr> <td><b>Quantization</b></td> <td>INT8, INT4, binary/ternary, fixed-point arithmetic, post-training vs QAT</td> <td>"Your keyword spotting model loses 8% accuracy going from INT8 to INT4. Is that acceptable? How do you recover it?"</td> </tr> <tr> <td><b>Integer-only inference</b></td> <td>No floating point โ all math in fixed-point, requantization between layers</td> <td>"Explain how a quantized Conv2D executes on a Cortex-M4 with no FPU."</td> </tr> <tr> <td><b>Model architecture</b></td> <td>MobileNet, MCUNet, depth-wise separable convolutions, NAS for MCUs</td> <td>"Why does MobileNetV2 use inverted residuals, and why does that matter on a microcontroller?"</td> </tr> <tr> <td><b>Power & energy</b></td> <td>Active vs sleep power, duty cycling, energy harvesting budgets</td> <td>"Your sensor wakes up every 10 seconds, runs inference, and sleeps. What's the average power draw?"</td> </tr> <tr> <td><b>Compiler & runtime</b></td> <td>TFLite Micro, TVM, CMSIS-NN, ahead-of-time compilation, no dynamic allocation</td> <td>"Why can't TFLite Micro use malloc? What does it use instead?"</td> </tr> <tr> <td><b>Sensor pipelines</b></td> <td>Audio (keyword spotting), accelerometer (gesture), image (person detection)</td> <td>"Your microphone samples at 16 kHz. How do you extract Mel spectrograms in real-time on a Cortex-M4?"</td> </tr> </tbody> </table>We need more TinyML questions โ especially from engineers at Edge Impulse, Arduino, and embedded AI teams. See the question format and submit a PR.