docs/2-CORE-CONCEPTS/podcasts-explained.md
Podcasts are Open Notebook's highest-level transformation: converting your research into audio dialogue for a different consumption pattern.
Research naturally accumulates as text: PDFs, articles, web pages, notes. This creates a friction point:
To consume research, you must:
But much of life is passive time:
Convert your research into audio dialogue so you can consume it passively.
Before (Text-based):
Research pile → Must schedule reading time → Requires focus
After (Podcast):
Research pile → Podcast → Can listen while commuting
→ Absorb while exercising
→ Understand while walking
→ Engage without screen time
You choose what goes into the podcast:
Notebook content → Which sources? → Which notes?
→ Which topics to focus on?
→ Depth of coverage?
You define how you want the podcast structured:
Episode Profile
├─ Topic: "AI Safety Approaches"
├─ Length: 20 minutes
├─ Tone: Academic but accessible
├─ Format: Debate (2 speakers with opposing views)
├─ Audience: Researchers new to the field
└─ Focus areas: Main approaches, pros/cons, open questions
You create speaker personas (1-4 speakers):
Speaker 1: "Expert Alex"
├─ Expertise: "Deep knowledge of alignment research"
├─ Personality: "Rigorous, academic, patient with explanation"
├─ Accent: (Optional) "British English"
└─ Voice Model: Selected from model registry (e.g., OpenAI TTS)
└─ Optional per-speaker override of the episode's default voice model
Speaker 2: "Researcher Sam"
├─ Expertise: "Field observer, pragmatic perspective"
├─ Personality: "Curious, asks clarifying questions"
├─ Accent: "American English"
└─ Voice Model: Selected from model registry (e.g., ElevenLabs TTS)
System generates episode outline:
EPISODE: "AI Safety Approaches"
1. Introduction (2 min)
Alex: Introduces topic and speakers
Sam: What will we cover today?
2. Main Approaches (8 min)
Alex: Explains top 3 approaches
Sam: Asks about tradeoffs
3. Debate: Best approach? (6 min)
Alex: Advocates for approach A
Sam: Argues for approach B
4. Open Questions (3 min)
Both: What's unsolved?
5. Conclusion (1 min)
Recap and where to learn more
System generates dialogue based on outline:
Alex: "Today we're exploring three major approaches to AI alignment..."
Sam: "That's a great start. Can you break down what we mean by alignment?"
Alex: "Good question. Alignment means ensuring AI systems pursue the goals
we actually want them to pursue, not just what we literally asked for.
There's a classic example of a paperclip maximizer..."
Sam: "Interesting. So it's about solving the intention problem?"
Alex: "Exactly. And that's where the three approaches come in..."
System converts dialogue to audio using the voice models configured in the model registry. Credentials are automatically resolved from each model's configuration.
Alex's text → Voice model (from registry) → Alex's voice (audio file)
Sam's text → Voice model (from registry) → Sam's voice (audio file)
Audio files → Mix together → Final podcast MP3
Podcast generation involves multiple steps (outline, transcript, TTS) and depends on external AI providers. Sometimes things fail.
When podcast generation fails (e.g., wrong model configured, API key expired, provider outage):
| Error | What to Do |
|---|---|
| Invalid API key | Check Settings -> Credentials for the TTS and language model providers |
| Model not found | Verify the model exists in the model registry and has valid credentials configured |
| Rate limit exceeded | Wait a few minutes and retry |
| Provider unavailable | Check provider status page; retry later |
Podcasts are generated in the background. You upload → system processes → you download when ready.
Why? Podcast generation takes time (10+ minutes for a 30-minute episode). Blocking would lock up your interface.
Unlike Google Notebook LM (always 2 hosts), you choose 1-4 speakers.
Why? Different discussions work better with different formats:
You create rich speaker profiles, not just "Host A" and "Host B".
Why? Makes podcasts more engaging and authentic. Different speakers bring different perspectives.
You're not locked into one voice provider.
Why?
Can generate podcasts entirely offline with local text-to-speech.
Why? For sensitive research, never send audio to external APIs.
Traditional: Academic paper → PDF
Problem: Hard to consume, linear reading required
Open Notebook:
Research materials → Podcast (expert explaining methodology)
→ Podcast (debate format: different interpretations)
→ Different consumption for different audiences
Blog creator: Has research pile on a topic
Problem: Doesn't have time to write the article
Solution:
Add research → Create podcast → Transcribe → Becomes article
OR: Podcast BECOMES the content (upload to podcast platforms)
Educator: Has reading materials for a course
Problem: Students don't read the papers
Solution:
Create podcast with expert explaining papers
Students listen → Better engagement → Discussions can reference podcast
Product manager: Has interviews with customers
Problem: Too many hours of audio to review
Solution:
Create podcast with debate format (customer perspective vs. team perspective)
Much more engaging than raw transcripts
Domain expert: Leaving the organization
Problem: How to preserve expertise?
Solution:
Create expert-mode podcast explaining frameworks, decision-making, context
New team member listens, gets context faster than reading 100 documents
They complement each other:
1. Build notebook (add sources)
↓
2. Apply transformations (extract insights)
↓
3. Chat/Ask (explore content)
↓
4. Decide on podcast
├─→ Create speaker profiles
├─→ Define episode profile
├─→ Configure voice models (from model registry)
└─→ Generate podcast
↓
5. Listen while commuting/exercising
↓
6. Reference sources for deep dive
↓
7. Repeat for different formats/speakers/focus
You can create different podcasts from the same sources:
Podcast 1: "Expert Monologue"
Speaker: Researcher explaining field
Format: Educational, comprehensive
Audience: Students new to field
Podcast 2: "Debate Format"
Speakers: Optimist vs. skeptic
Format: Discussion of tradeoffs
Audience: Advanced researchers
Podcast 3: "Interview Format"
Speakers: Journalist + expert
Format: Q&A about practical applications
Audience: Industry practitioners
Each tells the same story from different angles.
Option 1: Cloud TTS (Faster, Higher Quality)
Your outline → API call to TTS provider
→ Audio returned
→ Stored in your notebook
Provider sees: Your outlined script (not raw sources)
Privacy level: Medium (outline is shared, sources aren't)
Option 2: Local TTS (Slower, Maximum Privacy)
Your outline → Local TTS engine (runs on your machine)
→ Audio generated locally
→ Stored in your notebook
Provider sees: Nothing
Privacy level: Maximum (everything local)
| Provider | Cost | Quality | Speed |
|---|---|---|---|
| OpenAI | ~$0.015 per minute | Good | Fast |
| ~$0.004 per minute | Excellent | Fast | |
| ElevenLabs | ~$0.10 per minute | Exceptional | Medium |
| Local TTS | Free | Basic | Slow |
A 30-minute podcast costs:
Podcasts transform your research consumption:
| Aspect | Text | Podcast |
|---|---|---|
| How consumed? | Active reading | Passive listening |
| Where consumed? | Desk | Anywhere |
| Multitasking | Hard | Easy |
| Time commitment | Scheduled | Flexible |
| Format | Whatever | Natural dialogue |
| Engagement | Academic | Conversational |
| Accessibility | Text-based | Audio-based |
In Open Notebook specifically:
This is why podcasts matter: they change when and how you can consume your research.