docs/3-USER-GUIDE/adding-sources.md
Sources are the raw materials of your research. This guide covers how to add different types of content.
1. In your notebook, click "Add Source"
2. Select "Upload File"
3. Choose a file from your computer
4. Click "Upload"
5. Wait 30-60 seconds for processing
6. Done! Source appears in your notebook
1. Click "Add Source"
2. Select "Web Link"
3. Paste URL: https://example.com/article
4. Click "Add"
5. Wait for processing (usually faster than files)
6. Done!
1. Click "Add Source"
2. Select "Text"
3. Paste or type your content
4. Click "Save"
5. Done! Immediately available
File size limits: Up to ~100MB (varies by system)
Processing time: 10 seconds - 2 minutes (depending on length and file type)
Automatic transcription: Audio/video is transcribed to text automatically. This requires enabling speech-to-text in settings.
Just paste the URL in "Web Link" section.
The system automatically does four things:
1. EXTRACT TEXT
File/URL → Readable text
(PDFs get OCR if scanned)
(Videos get transcribed if enabled)
2. BREAK INTO CHUNKS
Long text → ~500-word pieces
(So search finds specific parts, not whole document)
3. CREATE EMBEDDINGS
Each chunk → Vector representation
(Enables semantic/concept search)
4. INDEX & STORE
Everything → Database
(Ready to search and retrieve)
Time to use: After the progress bar completes, the source is ready immediately. Embeddings are created in the background.
Best practices:
Clean PDFs:
1. Upload → Done
2. Processing time: ~30-60 seconds
Scanned/Image PDFs:
1. Upload same way
2. System auto-detects and uses OCR
3. Processing time: ~2-3 minutes
4. (Higher, due to OCR overhead)
Large PDFs (50+ pages):
1. Consider splitting into smaller files
2. Or upload as-is (system handles it)
3. Processing time scales with size
Common issues:
Best practices:
1. Copy full URL from browser: https://example.com/article-title
2. Paste in "Web Link"
3. Click Add
4. Wait for extraction
Processing time: Usually 5-15 seconds
What works:
What doesn't work:
Pro tip: If it doesn't work, copy the article text and paste as "Text" instead.
Best practices:
1. Ensure speech-to-text is enabled in Settings
2. Upload MP3, WAV, or M4A file
3. System automatically transcribes to text
4. Processing time: ~1 minute per 5 minutes of audio
Example:
- 1-hour podcast → 12 minutes processing
- 10-minute recording → 2 minutes processing
Quality matters:
Tip: If audio quality is poor, the AI might misinterpret content. You can manually correct transcription if needed.
Best practices:
Two ways to add:
Method 1: Direct URL
1. Copy YouTube URL: https://www.youtube.com/watch?v=...
2. Paste in "Web Link"
3. Click Add
4. System extracts captions (if available) + transcript
Method 2: Playlist
1. Paste playlist URL
2. System adds all videos as separate sources
3. Each video processed separately
4. Takes longer (multiple videos)
What's extracted:
Processing:
Best practices:
1. Select "Text" when adding source
2. Paste or type content
3. System processes immediately
4. No wait time needed
Good for:
- Notes you want to reference
- Quotes from books
- Transcripts you have handy
- Quick research snippets
Click on source → See:
- Original file name/title
- When it was added
- Size and format
- Processing status
- Number of chunks
You can add to each source:
Why this matters:
After sources are added, you can:
Text search: "Find exact phrase"
Vector search: "Find conceptually similar"
Both search across all sources in notebook.
Results show:
- Which source
- Which section
- Relevance score
You control how AI accesses sources:
Full Content:
AI sees: Complete source text
Cost: 100% of tokens
Use when: Analyzing in detail, need precise citations
Example: "Analyze this methodology paper closely"
Summary Only:
AI sees: AI-generated summary (not full text)
Cost: ~10-20% of tokens
Use when: Background material, reference context
Example: "Use this as context but focus on the main source"
Not in Context:
AI sees: Nothing (excluded)
Cost: 0 tokens
Use when: Confidential, not relevant, or archived
Example: "Keep this in notebook but don't use in this conversation"
1. Go to Chat
2. Click "Select Context Sources"
3. For each source:
- Toggle ON/OFF (include/exclude)
- Choose level (Full/Summary/Excluded)
4. Click "Save"
5. Now chat uses these settings
| Mistake | What Happens | How to Fix |
|---|---|---|
| Upload 200 sources at once | System gets slow, processing stalls | Add 10-20 at a time, wait for processing |
| Use full content for all sources | Token usage skyrockets, expensive | Use "Summary" or "Excluded" for background material |
| Add huge PDFs without splitting | Processing is slow, search results less precise | Consider splitting large PDFs into chapters |
| Forget source titles | Can't distinguish between similar sources | Rename sources with descriptive titles right after uploading |
| Don't tag sources | Hard to find and organize later | Add tags immediately: "primary", "background", etc. |
| Mix languages in one source | Transcription/embedding quality drops | Keep each language in separate sources |
| Use same source multiple times | Takes up space, creates confusion | Add once; reuse in multiple chats/notebooks |
🟡 Processing
→ Source is being extracted and embedded
→ Wait 30 seconds - 3 minutes depending on size
→ Don't use in Chat yet
🟢 Ready
→ Source is processed and searchable
→ Can use immediately in Chat
→ Can apply transformations
🔴 Error
→ Something went wrong
→ Common reasons:
- Unsupported file format
- File too large or corrupted
- Network timeout
⚪ Not in Context
→ Source added but excluded from Chat
→ Still searchable, not sent to AI
"Unsupported file type"
.webp image)"Processing timeout"
"Transcription failed"
"Web link won't extract"
Once you've added sources, you can:
Before adding sources, confirm:
Done! Sources are now ready for Chat, Search, Transformations, and more.