Back to Oh My Openagent

Investigating opencode sessions in the DB (Case D)

.agents/skills/opencode-qa/references/db-investigation.md

4.6.05.8 KB
Original Source

Investigating opencode sessions in the DB (Case D)

Table of Contents

Where the data lives

Active DB path: opencode db path (on this machine ~/.local/share/opencode/opencode.db).

Derived from XDG data dir + "opencode" + "opencode.db" (or "opencode-<channel>.db" on non-stable channels). Override via env OPENCODE_DB (:memory:, absolute, or relative-to-data).

It is large (tens of GB) because the part table stores tool output. The session table is small (~21k rows; full scans are milliseconds).

Access methods

Preferred: opencode db "<SQL>" --format json (WAL-safe; resolves the active DB). --format tsv default. Bare opencode db opens an interactive sqlite3 shell. opencode db path prints the file.

Raw fallback for EXPLAIN/perf: sqlite3 "$(opencode db path)" "<SQL>". Reads are safe alongside a running opencode (WAL allows concurrent readers).

Schema (the tables that matter)

Note the ACTIVE storage in v1.15.13 is the LEGACY pair message + part; the V2 session_message table exists but is EMPTY in this version (a recent session showed 43 message rows, 169 part rows, 0 session_message). Document both but make clear message/part is what holds current data.

session

ColumnNotes
idPK, 'ses_' prefix
project_idFK
parent_id
slug
directory
titleNOT NULL
version
agent
modelJSON {providerID, modelID}
cost
tokens_input
tokens_output
tokens_reasoning
tokens_cache_read
tokens_cache_write
metadataJSON
time_createdepoch MILLISECONDS
time_updatedepoch MILLISECONDS
time_archived

Indexes: project_id, parent_id, workspace_id. NO index on title or time_created.

message (legacy)

ColumnNotes
id'msg_' prefix
session_idFK -> session, cascade
time_created
time_updated
dataJSON: {role, time:{created}, summary:{title}, agent, model:{providerID,modelID}, variant}

part (legacy)

ColumnNotes
id'prt_' prefix
message_idFK -> message, cascade
session_iddenormalized; index part_session_idx
dataJSON

Part types seen: text, reasoning, tool, step-start, step-finish. A text part is {"type":"text","text":"..."}.

Other tables

session_message (V2, currently empty), todo, project, permission, session_share, workspace, event.

Time conversion

time_created/time_updated are epoch milliseconds. Convert:

sql
datetime(time_created/1000,'unixepoch')

Tested query patterns

1. By id (instant)

Script: scripts/db-session-by-id.sh <ses_id>

sql
SELECT
  id,
  slug,
  title,
  directory,
  agent,
  json_extract(model,'$.modelID') AS model,
  json_extract(model,'$.providerID') AS provider,
  cost,
  tokens_input,
  tokens_output,
  datetime(time_created/1000,'unixepoch') AS created,
  datetime(time_updated/1000,'unixepoch') AS updated
FROM session
WHERE id='<ses_id>'

2. By name/title (0.006s over 21k rows)

Script: scripts/db-session-by-name.sh "<substr>" [limit]

sql
SELECT
  id,
  title,
  datetime(time_created/1000,'unixepoch') AS created
FROM session
WHERE title LIKE '%<substr>%'
ORDER BY time_created DESC
LIMIT <N>

3. By message text

Script: scripts/db-session-by-text.sh (--session <id>|--recent <N>|--since "<window>") [--limit N] "<text>"

CRITICAL performance note: text lives in part.data JSON, and part is the multi-GB table, so an UNBOUNDED text scan is refused by the script. Always scope it.

Scoped within one session (indexed, ~0.017s)

sql
SELECT
  p.session_id,
  p.id,
  substr(json_extract(p.data,'$.text'),1,120)
FROM part p
WHERE p.session_id='<id>'
  AND json_extract(p.data,'$.type')='text'
  AND json_extract(p.data,'$.text') LIKE '%<text>%'
LIMIT 50

Bounded to the N most-recent sessions (worst-case ~0.02s)

sql
SELECT
  p.session_id,
  p.id,
  substr(json_extract(p.data,'$.text'),1,120)
FROM part p
WHERE p.session_id IN (
  SELECT id FROM session ORDER BY time_created DESC LIMIT <N>
)
  AND json_extract(p.data,'$.type')='text'
  AND json_extract(p.data,'$.text') LIKE '%<text>%'
LIMIT 50

AVOID this naive form (took ~50s)

A JOIN FROM session s JOIN part p ON p.session_id=s.id WHERE s.time_created >= X ... scans oldest sessions first. The IN-subquery (newest-first, drives part_session_idx) is the right shape because it lets SQLite use the index on part.session_id with a small, ordered set of recent session IDs, rather than scanning the entire part table from the oldest sessions upward.

4. Full export

Script: scripts/export-roundtrip.sh <ses_id> wraps opencode export <id> 2>/dev/null -> clean JSON {info:{id,slug,projectID,directory,title,tokens,time,...}, messages:[...]} (banner goes to stderr).

5. Listing recent sessions

sql
SELECT
  id,
  title,
  datetime(time_created/1000,'unixepoch') created
FROM session
ORDER BY time_created DESC
LIMIT 100

The 25 GB caveat

Global text search over all parts is a full scan of the largest table and can take a long time. The bundled script refuses it; you must pass --session, --recent, or --since. Title search (session table) is always cheap.

Verifying read-only

All Case D operations are reads. To prove a QA pass did not mutate the DB, compare before and after:

bash
sqlite3 "$(opencode db path)" "SELECT count(*) FROM session"

These queries are exactly what the scripts/db-*.sh helpers run; each ships a --self-test.