tinytorch/INSTRUCTOR.md
Complete guide for teaching ML Systems Engineering with TinyTorch.
TinyTorch teaches ML systems engineering through building, not just using. Students construct a complete ML framework from tensors to transformers, understanding memory, performance, and scaling at each step.
# Clone and setup
git clone https://github.com/harvard-edge/cs249r_book.git
cd cs249r_book/tinytorch
# Virtual environment (MANDATORY)
python -m venv .venv
source .venv/bin/activate
# Install with instructor tools
pip install -r requirements.txt
pip install nbgrader
# Setup grading infrastructure
tito nbgrader init
tito system health
# Should show all green checkmarks
tito nbgrader
# Should show available NBGrader commands
The NBGrader integration is under active development. Use for testing only.
We provide tito nbgrader commands for grading workflows.
# Generate instructor version (with solutions)
tito nbgrader generate 01_tensor
# Create student version (solutions removed)
tito nbgrader release 01_tensor
# Student version will be in: assignments/release/01_tensor/
# Option A: GitHub Classroom (recommended)
# 1. Create assignment repository from TinyTorch
# 2. Remove solutions from modules
# 3. Students clone and work
# Option B: Direct distribution
# Share the release/ directory contents
# Collect all students
tito nbgrader collect 01_tensor
# Or specific student
tito nbgrader collect 01_tensor --student student_id
# Grade all submissions
tito nbgrader autograde 01_tensor
# Grade specific student
tito nbgrader autograde 01_tensor --student student_id
# Use NBGrader's formgrader for manual review
# This launches a web interface for:
# - Reviewing ML Systems question responses
# - Adding feedback comments
# - Adjusting auto-grades
nbgrader formgrader
# Create feedback files for students
tito nbgrader feedback 01_tensor
# Export grades report
tito nbgrader report
# Or specific module
tito nbgrader report --module 01_tensor
| Points | Criteria |
|---|---|
| 9-10 | Demonstrates deep understanding, references specific code, discusses systems implications |
| 7-8 | Good understanding, some code references, basic systems thinking |
| 5-6 | Surface understanding, generic response, limited systems perspective |
| 3-4 | Attempted but misses key concepts |
| 0-2 | No attempt or completely off-topic |
What to Look For:
This section provides sample solutions to help calibrate grading standards. Use these as reference points when evaluating student submissions.
Excellent Solution (9-10 points):
def memory_footprint(self):
"""Calculate tensor memory in bytes."""
return self.data.nbytes
Why Excellent:
nbytes propertyGood Solution (7-8 points):
def memory_footprint(self):
"""Calculate memory usage."""
return np.prod(self.data.shape) * self.data.dtype.itemsize
Why Good:
nbytesAcceptable Solution (5-6 points):
def memory_footprint(self):
size = 1
for dim in self.data.shape:
size *= dim
return size * 4 # Assumes float32
Why Acceptable:
Excellent Solution (9-10 points):
def backward(self, gradient=None):
"""Backward pass through computational graph."""
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn is not None:
# Compute gradients for inputs
input_grads = self.grad_fn.backward(gradient)
# Propagate to input tensors
if isinstance(input_grads, tuple):
for input_tensor, input_grad in zip(self.grad_fn.inputs, input_grads):
if input_tensor.requires_grad:
input_tensor.backward(input_grad)
else:
if self.grad_fn.inputs[0].requires_grad:
self.grad_fn.inputs[0].backward(input_grads)
Why Excellent:
requires_grad before propagatingGood Solution (7-8 points):
def backward(self, gradient=None):
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn:
grads = self.grad_fn.backward(gradient)
for inp, grad in zip(self.grad_fn.inputs, grads):
inp.backward(grad)
Why Good:
requires_grad check (minor issue)Acceptable Solution (5-6 points):
def backward(self, grad):
self.grad = grad
if self.grad_fn:
self.grad_fn.inputs[0].backward(self.grad_fn.backward(grad))
Why Acceptable:
Excellent Solution (9-10 points):
def forward(self, x):
"""Forward pass with explicit loops for clarity."""
batch_size, in_channels, height, width = x.shape
out_height = (height - self.kernel_size + 2 * self.padding) // self.stride + 1
out_width = (width - self.kernel_size + 2 * self.padding) // self.stride + 1
output = np.zeros((batch_size, self.out_channels, out_height, out_width))
# Apply padding
if self.padding > 0:
x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding),
(self.padding, self.padding)), mode='constant')
# Explicit convolution loops
for b in range(batch_size):
for oc in range(self.out_channels):
for oh in range(out_height):
for ow in range(out_width):
h_start = oh * self.stride
w_start = ow * self.stride
h_end = h_start + self.kernel_size
w_end = w_start + self.kernel_size
window = x[b, :, h_start:h_end, w_start:w_end]
output[b, oc, oh, ow] = np.sum(
window * self.weight[oc] + self.bias[oc]
)
return Tensor(output, requires_grad=x.requires_grad)
Why Excellent:
Good Solution (7-8 points):
def forward(self, x):
B, C, H, W = x.shape
out_h = (H - self.kernel_size) // self.stride + 1
out_w = (W - self.kernel_size) // self.stride + 1
out = np.zeros((B, self.out_channels, out_h, out_w))
for b in range(B):
for oc in range(self.out_channels):
for i in range(out_h):
for j in range(out_w):
h = i * self.stride
w = j * self.stride
out[b, oc, i, j] = np.sum(
x[b, :, h:h+self.kernel_size, w:w+self.kernel_size]
* self.weight[oc]
) + self.bias[oc]
return Tensor(out)
Why Good:
Acceptable Solution (5-6 points):
def forward(self, x):
out = np.zeros((x.shape[0], self.out_channels, x.shape[2]-2, x.shape[3]-2))
for b in range(x.shape[0]):
for c in range(self.out_channels):
for i in range(out.shape[2]):
for j in range(out.shape[3]):
out[b, c, i, j] = np.sum(x[b, :, i:i+3, j:j+3] * self.weight[c])
return Tensor(out)
Why Acceptable:
Excellent Solution (9-10 points):
def forward(self, query, key, value, mask=None):
"""Scaled dot-product attention with numerical stability."""
# Compute attention scores
scores = np.dot(query, key.T) / np.sqrt(self.d_k)
# Apply mask if provided
if mask is not None:
scores = np.where(mask, scores, -1e9)
# Softmax with numerical stability
exp_scores = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
attention_weights = exp_scores / np.sum(exp_scores, axis=-1, keepdims=True)
# Apply attention to values
output = np.dot(attention_weights, value)
return output, attention_weights
Why Excellent:
Good Solution (7-8 points):
def forward(self, q, k, v):
scores = np.dot(q, k.T) / np.sqrt(q.shape[-1])
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)
return np.dot(weights, v)
Why Good:
Acceptable Solution (5-6 points):
def forward(self, q, k, v):
scores = np.dot(q, k.T)
weights = np.exp(scores) / np.sum(np.exp(scores))
return np.dot(weights, v)
Why Acceptable:
When Evaluating Student Code:
Correctness First: Does it pass all tests?
Code Quality:
Systems Thinking:
Common Patterns:
Remember: These are calibration examples. Adjust based on your course level and learning objectives. The goal is consistent evaluation, not perfection.
By course end, students should be able to:
# Check specific student progress
tito module status --student student_id
# Export all module progress
tito module status --export class_progress.csv
Look for:
# Use profilers liberally
with TimeProfiler("operation"):
result = expensive_operation()
# Show memory usage
print(f"Memory: {get_memory_usage():.2f} MB")
Environment Problems
# Student fix:
tito system health
tito module reset XX # Reset specific module if needed
Module Import Errors
# Rebuild package
tito export --all
Test Failures
# Detailed test output
tito module test MODULE --verbose
Database Locked
# Clear NBGrader database
rm gradebook.db
tito nbgrader init
Missing Submissions
# Check submission directory
ls submitted/*/MODULE/
| Week | Module | Focus |
|---|---|---|
| 1 | 01 Tensor | Data Structures, Memory |
| 2 | 02 Activations | Non-linearity Functions |
| 3 | 03 Layers | Neural Network Components |
| 4 | 04 Losses | Optimization Objectives |
| 5 | 05 DataLoader | Data Pipeline |
| 6 | 06 Autograd | Automatic Differentiation |
| 7 | 07 Optimizers | Training Algorithms |
| 8 | 08 Training | Complete Training Loop |
| 9 | Midterm Project | Build and Train Network |
| 10 | 09 Spatial | Convolutions, CNNs |
| 11 | 10 Tokenization | Text Processing |
| 12 | 11 Embeddings | Word Representations |
| 13 | 12 Attention | Attention Mechanisms |
| 14 | 13 Transformers | Transformer Architecture |
| 15 | 14-19 Optimization | Profiling, Quantization, etc. |
| 16 | 20 Capstone | Torch Olympics Competition |
Need help? Open an issue or contact the TinyTorch team!