skills/creative/manim-video/references/paper-explainer.md
How to turn a research paper into an animated explainer video.
A research paper is optimized for precision and completeness. A video is optimized for understanding and retention. The translation is NOT "read the paper aloud with pictures" — it's "extract the core insight and make it feel obvious through visual storytelling."
The paper has one job: prove the claim is true. The video has a different job: make the viewer understand WHY the claim is true, and WHY it matters.
Before anything, decide the audience:
| Audience | Prerequisites | Pacing | Depth |
|---|---|---|---|
| General public | None | Slow, many analogies | Intuition only, skip proofs |
| Undergrad students | Basic math/CS | Medium, some formalism | Key equations, skip derivations |
| Grad students / researchers | Domain knowledge | Faster, more notation | Full equations, sketch proofs |
This determines everything: vocabulary, pacing, which sections to animate, how much math to show.
Most paper explainers fit this structure (scale times proportionally for longer videos):
| Section | Duration | Purpose |
|---|---|---|
| Hook | 0:00-0:30 | Surprising result or provocative question |
| Problem | 0:30-1:30 | What was broken/missing before this paper |
| Key insight | 1:30-3:00 | The core idea, explained visually |
| How it works | 3:00-4:00 | Method/algorithm, simplified |
| Evidence | 4:00-4:30 | Key result that proves it works |
| Implications | 4:30-5:00 | Why it matters, what it enables |
Write the full narration before any code. Every sentence maps to a visual beat. If you can't write the narration, you don't understand the paper well enough to animate it.
## Hook (30s)
"What if I told you that a model with 7 billion parameters can outperform
one with 70 billion — if you train it on the right data?"
## Problem (60s)
"The standard approach is to scale up. More parameters, more compute.
[VISUAL: bar chart showing model sizes growing exponentially]
But Chinchilla showed us that most models are undertrained..."
After the narration, break it into scenes. Each scene is one Manim class.
Scene 1: Hook — surprising stat with animated counter
Scene 2: Problem — model size bar chart growing
Scene 3: Key insight — training data vs parameters, animated 2D plot
Scene 4: Method — pipeline diagram building left to right
Scene 5: Results — before/after comparison with animated bars
Scene 6: Closing — implications text
Before coding scenes, define the visual language:
# style.py — import in every scene file
BG = "#0D1117"
PRIMARY = "#58C4DD"
SECONDARY = "#83C167"
ACCENT = "#FFFF00"
HIGHLIGHT = "#FF6B6B"
MONO = "Menlo"
# Color meanings for THIS paper
MODEL_COLOR = PRIMARY # "the model"
DATA_COLOR = SECONDARY # "training data"
BASELINE_COLOR = HIGHLIGHT # "previous approach"
RESULT_COLOR = ACCENT # "our result"
When the paper has a key equation, don't just show it — build it from intuition:
# Scene: Why we need attention (for a Transformer paper)
# Step 1: "How do we let each word look at every other word?"
# Step 2: Show naive approach (fully connected = O(n²) everything)
# Step 3: Show it breaks (information overload, no selectivity)
# Step 4: "What if each word could CHOOSE which words to attend to?"
# Step 5: Show attention equation — Q, K, V now mean something
# Show equation dimmed first (full destination)
eq = MathTex(r"Attention(Q,K,V) = softmax\left(\frac{QK^T}{\sqrt{d_k}}\right)V")
eq.set_opacity(0.15)
self.play(FadeIn(eq))
# Highlight Q, K, V one at a time with color + label
for part, color, label_text in [
(r"Q", PRIMARY, "Query: what am I looking for?"),
(r"K", SECONDARY, "Key: what do I contain?"),
(r"V", ACCENT, "Value: what do I output?"),
]:
eq.set_color_by_tex(part, color)
label = Text(label_text, font_size=18, color=color, font=MONO)
# position label, animate it, wait, then dim it
Don't show the full architecture at once. Build it:
# Component factory
def make_box(label, color, width=2.0, height=0.8):
box = RoundedRectangle(corner_radius=0.1, width=width, height=height,
color=color, fill_opacity=0.1, stroke_width=1.5)
text = Text(label, font_size=18, font=MONO, color=color).move_to(box)
return Group(box, text)
encoder = make_box("Encoder", PRIMARY)
decoder = make_box("Decoder", SECONDARY).next_to(encoder, RIGHT, buff=1.5)
arrow = Arrow(encoder.get_right(), decoder.get_left(), color=DIM, stroke_width=1.5)
self.play(FadeIn(encoder))
self.wait(1) # explain encoder
self.play(GrowArrow(arrow))
self.play(FadeIn(decoder))
self.wait(1) # explain decoder
After building the diagram, show data moving through it:
# Dot traveling along the pipeline
data_dot = Dot(color=ACCENT, radius=0.1).move_to(encoder)
self.play(FadeIn(data_dot))
self.play(MoveAlongPath(data_dot, arrow), run_time=1)
self.play(data_dot.animate.move_to(decoder), run_time=0.5)
self.play(Flash(data_dot.get_center(), color=ACCENT), run_time=0.3)
# Before/after bars
before_data = [45, 52, 38, 61]
after_data = [78, 85, 72, 91]
labels = ["Task A", "Task B", "Task C", "Task D"]
before_chart = BarChart(before_data, bar_names=labels,
y_range=[0, 100, 20], bar_colors=[HIGHLIGHT]*4).scale(0.6).shift(LEFT*3)
after_chart = BarChart(after_data, bar_names=labels,
y_range=[0, 100, 20], bar_colors=[SECONDARY]*4).scale(0.6).shift(RIGHT*3)
before_label = Text("Baseline", font_size=20, color=HIGHLIGHT, font=MONO)
after_label = Text("Ours", font_size=20, color=SECONDARY, font=MONO)
# Reveal baseline first, then ours (dramatic comparison)
self.play(Create(before_chart), FadeIn(before_label))
self.wait(1.5)
self.play(Create(after_chart), FadeIn(after_label))
self.wait(0.5)
# Highlight the improvement
improvement = Text("+35% avg", font_size=24, color=ACCENT, font=MONO)
self.play(FadeIn(improvement))
tracker = ValueTracker(0)
curve = always_redraw(lambda: axes.plot(
lambda x: 1 - 0.8 * np.exp(-x / 3),
x_range=[0, tracker.get_value()], color=PRIMARY
))
epoch_label = always_redraw(lambda: Text(
f"Epoch {int(tracker.get_value())}", font_size=18, font=MONO
).to_corner(UR))
self.add(curve, epoch_label)
self.play(tracker.animate.set_value(10), run_time=5, rate_func=linear)
ValueTrackerLinearTransformationScene for linear algebraArrowVectorField / StreamLinesNumberPlane + trajectoriesShowPassingFlash for data flow along arrowsZoomedScene for zooming into components