examples/entire_session_search/README.md
Semantic search over AI coding sessions captured by Entire, powered by CocoIndex.
Because CocoIndex is incremental, re-running after new sessions only processes what changed.
1. Check out the Entire checkpoint data:
# From any repo where Entire is capturing sessions
git worktree add entire_checkpoints entire/checkpoints/v1
2. Install deps:
pip install -e .
3. Set env vars (or edit .env):
# .env
COCOINDEX_DB=./cocoindex.db
POSTGRES_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
Build the index:
cocoindex update main.py
Search your sessions:
python main.py "how did I fix the auth bug"
Or start an interactive search:
python main.py
| Variable | Default | Description |
|---|---|---|
COCOINDEX_DB | ./cocoindex.db | SQLite path for CocoIndex internal state |
POSTGRES_URL | postgres://cocoindex:cocoindex@localhost/cocoindex | Postgres connection for embedding/metadata tables |
TABLE_EMBEDDINGS | session_embeddings | Embeddings table name |
TABLE_METADATA | session_metadata | Metadata table name |
PG_SCHEMA_NAME | entire | Postgres schema |
graph TD
checkpoints[Entire Checkpoints] --> walk[walk_dir]
walk --> mount_each[mount_each]
mount_each --> process_file[<b>process_file</b>]
process_file -->|full.jsonl| parse[parse_transcript]
process_file -->|prompt.txt| embed_prompt[embed directly]
process_file -->|context.md| split[RecursiveSplitter]
process_file -->|metadata.json| meta_table[(session_metadata)]
parse --> embed[SentenceTransformer]
embed_prompt --> embed
split --> embed
embed --> emb_table[(session_embeddings)]
<checkpoint_id[:2]>/<checkpoint_id[2:]>/<session_idx>/
├── metadata.json # token counts, files touched, timestamps (note: if one prompt spans multiple commits, each gets its own checkpoint with the same token data; don't sum across checkpoints)
├── full.jsonl # conversation transcript
├── prompt.txt # user's initial prompt
├── context.md # AI-generated session summary
└── content_hash.txt # content fingerprint (skipped)