tinytorch/site/milestones/03_mlp_ABOUT.md
- How networks automatically discover features (edges, patterns) you never programmed
- Why representation learning is the foundation of deep learning's power
- That YOUR code can achieve 95%+ accuracy on a real benchmark
For 17 years, neural networks were considered dead.
After Minsky's XOR proof (Milestone 02), funding dried up, researchers moved on, and "neural network" became a dirty word in AI. The field was stuck in the AI Winter.
Then in 1986, Rumelhart, Hinton, and Williams published a single paper that changed everything: "Learning representations by back-propagating errors." They proved that multi-layer networks could learn automatically—no hand-crafted features, no expert rules. Just data in, patterns out.
This milestone recreates that breakthrough. You'll train YOUR TinyTorch implementation on real images and watch it discover features you never programmed.
Multi-layer perceptrons (MLPs) for digit recognition:
Images --> Flatten --> Linear --> ReLU --> Linear --> ReLU --> Linear --> Classes
| Module | Component | What It Provides |
|---|---|---|
| 01-04 | Foundation | Tensor, Activations, Layers, Losses |
| 05 | DataLoader | YOUR batching and data pipeline |
| 06-08 | Training Infrastructure | Autograd, Optimizers, Training loops |
Before running, ensure you have completed Modules 01-08. You can check your progress:
tito module status
cd milestones/03_1986_mlp
# Part 1: Quick validation
python 01_rumelhart_tinydigits.py
# Expected: 75-85% accuracy
# Part 2: Full MNIST benchmark
python 02_rumelhart_mnist.py
# Expected: 94-97% accuracy
| Script | Dataset | Parameters | Accuracy | Training Time |
|---|---|---|---|---|
| 01 (TinyDigits) | 1K train, 8x8 | ~2.4K | 75-85% | 3-5 min |
| 02 (MNIST) | 60K train, 28x28 | ~100K | 94-97% | 10-15 min |
Watch YOUR network learn something you never taught it.
After training, examine the first hidden layer weights. You'll see edge detectors—horizontal, vertical, diagonal patterns. Nobody programmed these. The network discovered them because edges are useful for recognizing digits.
This is representation learning, the foundation of deep learning's power:
The moment you realize: Your ~100 lines of TinyTorch code just replicated the breakthrough that ended the AI Winter.
Every component comes from YOUR implementations:
| Component | Your Module | What It Does |
|---|---|---|
Tensor | Module 01 | Stores images and weights |
Linear | Module 03 | YOUR fully-connected layers |
ReLU | Module 02 | YOUR activation functions |
CrossEntropyLoss | Module 04 | YOUR loss computation |
DataLoader | Module 05 | YOUR batching pipeline |
backward() | Module 06 | YOUR autograd engine |
SGD | Module 07 | YOUR optimizer |
No PyTorch. No TensorFlow. Just YOUR code learning to read handwritten digits.
MNIST (1998) became THE benchmark for evaluating learning algorithms. MLPs hitting 95%+ proved neural networks were viable for real problems.
The backpropagation paper has been cited over 50,000 times and is considered one of the most influential papers in computer science.
MLPs treat images as flat vectors, ignoring spatial structure. A 28x28 image has 784 pixels - the MLP doesn't know that pixel (0,0) is near pixel (0,1). Milestone 04 (CNN) shows why convolutional layers dramatically improve image recognition.