docs/2-CORE-CONCEPTS/ai-context-rag.md
Open Notebook uses different approaches to make AI models aware of your research depending on the feature. This section explains RAG (used in Ask) and full-content context (used in Chat).
Option 1: Fine-Tuning
Option 2: Send Everything to Cloud
Option 3: Ignore Your Data
For Chat: Sends the entire selected content to the LLM
For Ask (RAG): Retrieval-Augmented Generation
When you upload a source, Open Notebook prepares it for retrieval:
1. EXTRACT TEXT
PDF → text
URL → webpage text
Audio → transcribed text
Video → subtitles + transcription
2. CHUNK INTO PIECES
Long documents → break into ~500-word chunks
Why? AI context has limits; smaller pieces are more precise
3. CREATE EMBEDDINGS
Each chunk → semantic vector (numbers representing meaning)
Why? Allows finding chunks by similarity, not just keywords
4. STORE IN DATABASE
Chunks + embeddings + metadata → searchable storage
Example:
Source: "AI Safety Research 2026" (50-page PDF)
↓
Extracted: 50 pages of text
↓
Chunked: 150 chunks (~500 words each)
↓
Embedded: Each chunk gets a vector (1536 numbers for OpenAI)
↓
Stored: Ready for search
When you ask a question, the system finds relevant content:
1. YOU ASK A QUESTION
"What does the paper say about alignment?"
2. SYSTEM CONVERTS QUESTION TO EMBEDDING
Your question → vector (same way chunks are vectorized)
3. SIMILARITY SEARCH
Find chunks most similar to your question
(using vector math, not keyword matching)
4. RETURN TOP RESULTS
Usually top 5-10 most similar chunks
5. YOU GET BACK
✓ The relevant chunks
✓ Where they came from (sources + page numbers)
✓ Relevance scores
Example:
Q: "What does the paper say about alignment?"
↓
Q vector: [0.23, -0.51, 0.88, ..., 0.12]
↓
Search: Compare to all chunk vectors
↓
Results:
- Chunk 47 (alignment section): similarity 0.94
- Chunk 63 (safety approaches): similarity 0.88
- Chunk 12 (related work): similarity 0.71
Now you have the relevant pieces. The AI uses them:
SYSTEM BUILDS A PROMPT:
"You are an AI research assistant.
The user has the following research materials:
[CHUNK 47 CONTENT]
[CHUNK 63 CONTENT]
User question: 'What does the paper say about alignment?'
Answer based on the above materials."
AI RESPONDS:
"Based on the research materials, the paper approaches
alignment through [pulls from chunks] and emphasizes
[pulls from chunks]..."
SYSTEM ADDS CITATIONS:
"- See research materials page 15 for approach details
- See research materials page 23 for emphasis on X"
Open Notebook provides two different search strategies for different goals.
How it works:
When to use:
Example:
Search: "transformer architecture"
Results:
1. Chunk with "transformer architecture" 3 times
2. Chunk with "transformer" and "architecture" separately
3. Chunk with "transformer-based models"
How it works:
When to use:
Example:
Search: "what's the mechanism for model understanding?"
Results (no "understanding" in any chunk):
1. Chunk about interpretability and mechanistic analysis
2. Chunk about feature analysis
3. Chunk about attention mechanisms
Why? The vectors are semantically similar to your concept.
Here's where Open Notebook is different: You decide what the AI sees.
| Level | What's Shared | Example Cost | Privacy | Use Case |
|---|---|---|---|---|
| Full Content | Complete source text | 10,000 tokens | Low | Detailed analysis, close reading |
| Summary Only | AI-generated summary | 2,000 tokens | High | Background material, references |
| Not in Context | Nothing | 0 tokens | Max | Confidential, irrelevant, or archived |
Full Content:
You: "What's the methodology in paper A?"
System:
- Searches paper A
- Retrieves full paper content (or large chunks)
- Sends to AI: "Here's paper A. Answer about methodology."
- AI analyzes complete content
- Result: Detailed, precise answer
Summary Only:
You: "I want to chat using paper A and B"
System:
- For Paper A: Sends AI-generated summary (not full text)
- For Paper B: Sends full content (detailed analysis)
- AI sees 2 sources but in different detail levels
- Result: Uses summaries for context, details for focused content
Not in Context:
You: "I have 10 sources but only want 5 in context"
System:
- Paper A-E: In context (sent to AI)
- Paper F-J: Not in context (AI can't see them, doesn't search them)
- AI never knows these 5 sources exist
- Result: Tight, focused context
Privacy: You control what leaves your system
Scenario: Confidential company docs + public research
Control: Public research in context → Confidential docs excluded
Result: AI never sees confidential content
Cost: You control token usage
Scenario: 100 sources for background + 5 for detailed analysis
Control: Full content for 5 detailed, summaries for 95 background
Result: 80% lower token cost than sending everything
Quality: You control what the AI focuses on
Scenario: 20 sources, question requires deep analysis
Control: Full content for relevant source, exclude others
Result: AI doesn't get distracted; gives better answer
IMPORTANT: These use completely different approaches!
How it works:
YOU:
1. Select which sources to include in context
2. Set context level (full/summary/excluded)
3. Ask question
SYSTEM:
- Takes ALL selected sources (respecting context levels)
- Sends the ENTIRE content to the LLM at once
- NO search, NO retrieval, NO chunking
- AI sees everything you selected
AI:
- Responds based on the full content you provided
- Can reference any part of selected sources
- Conversational: context stays for follow-ups
Use this when:
Advantages:
Limitations:
How it works:
YOU:
Ask one complex question
SYSTEM:
1. Analyzes your question
2. Searches across ALL your sources automatically
3. Finds relevant chunks using vector similarity
4. Retrieves only the most relevant pieces
5. Sends ONLY those chunks to the LLM
6. Synthesizes into comprehensive answer
AI:
- Sees ONLY the retrieved chunks (not full sources)
- Answers based on what was found to be relevant
- One-shot answer (not conversational)
Use this when:
Advantages:
Limitations:
Open Notebook's RAG approach gives you something you don't get with ChatGPT or Claude directly:
You control the boundary between:
Because everything is retrieved explicitly, you can ask:
This prevents hallucinations or misrepresentation better than most systems.
The magic of semantic search comes from embeddings. Here's the intuition:
Instead of storing text, store it as a list of numbers (vectors) that represent "meaning."
Chunk: "The transformer uses attention mechanisms"
Vector: [0.23, -0.51, 0.88, 0.12, ..., 0.34]
(1536 numbers for OpenAI)
Another chunk: "Attention allows models to focus on relevant parts"
Vector: [0.24, -0.48, 0.87, 0.15, ..., 0.35]
(similar numbers = similar meaning!)
Words that are semantically similar produce similar vectors. So:
Your question: "How do models understand their decisions?"
Question vector: [0.25, -0.50, 0.86, 0.14, ..., 0.33]
Compare to all stored vectors. Find the most similar:
- Chunk about interpretability: similarity 0.94
- Chunk about explainability: similarity 0.91
- Chunk about feature attribution: similarity 0.88
Return the top matches.
This is why semantic search finds conceptually similar content even when words are different.
Why? Fine-tuning is slow and permanent. Search is flexible and reversible.
Why? You can verify what the AI saw. You have audit trails. You control what leaves your system.
Why? Different questions need different search (keyword vs. semantic). Giving you both is more powerful.
Why? Not everything you save needs to reach AI. You control granularly.
Open Notebook gives you two ways to work with AI:
Both approaches:
Coming Soon: The community is working on adding RAG capabilities to Chat as well, giving you the best of both worlds.