plans/codex_code_explorer.md
Add a default-off Settings experiment for a read-only local-agent tool named explore_code. The tool will use the TypeScript compiler API in an off-thread worker to build an in-memory code graph for TypeScript projects and return focused, line-numbered source windows for codebase-understanding queries.
This is intentionally an experiment. It should only be enabled for an app when the target app has TypeScript installed and configured, so the agent does not advertise or call a code explorer that would behave inconsistently on non-TypeScript projects.
list_files, grep, and read_file calls.grep, read_file, or code_search.Add a top-level user setting:
enableCodeExplorer: boolean; // default false
Implementation requirements:
Enable code explorer.ExperimentsSchema; use the same top-level settings pattern as other newer experiments.explore_code must be absent from local-agent tool definitions, prompt hints, and request snapshots.Tool name:
explore_code
Input schema:
{
query: string;
app_name?: string;
tsconfig_path?: string;
max_files?: number; // default 5, min 1, max 8
max_depth?: number; // default 2, min 0, max 3
}
Tool behavior:
modifiesState.tsconfig_path as app-relative and non-escaping.app_name through the existing local-agent app context flow.Tool output should be Markdown suitable for both the model and chat rendering:
## Code exploration: <query>
Found <symbolCount> symbols across <fileCount> files.
Indexed <indexedFileCount> files in <indexMs>ms; searched in <searchMs>ms.
#### src/path/file.ts - AuthService.login, createSession
```ts
42 export class AuthService {
43 async login(...) {
44 return createSession(...);
45 }
```
Include truncation notes when file, line, or character caps are hit.
The tool should only be enabled for a target app when all of these are true:
tsconfig_path, tsconfig.app.json, or tsconfig.json.If any condition fails:
explore_code in the tool set for that app when the app context is known before request construction.DyadErrorKind.Precondition and a concise message such as Code explorer requires TypeScript to be installed and configured in this app.This differs from the existing TS checker fallback behavior: explore_code should not use bundled Dyad TypeScript to explore arbitrary projects. The experiment is meant to measure TypeScript-aware navigation on projects that are actually configured for TypeScript.
Add a dedicated worker modeled after the existing TypeScript checker worker:
The worker should load TypeScript from the target app using Node resolution from the app root. If that fails, report the precondition error instead of falling back to Dyad's bundled TypeScript.
Recommended implementation split:
shared/code_explorer_types.ts: worker input/output and rendered result types only.workers/code_explorer/core/*: pure TypeScript compiler API core with no Electron imports.workers/code_explorer/code_explorer_worker.ts: thin parentPort wrapper around the pure core.src/ipc/processors/code_explorer.ts: main-process worker orchestrator.src/pro/main/ipc/handlers/local_agent/tools/explore_code.ts: local-agent tool wrapper.The pure core should be importable from unit tests without launching Electron. Every core function should receive the app-local typescript module as an injected dependency instead of importing typescript directly.
Worker build wiring:
code_explorer_worker.js is emitted next to the existing worker outputs.typescript external so resolution comes from the target app, not from Dyad.Config selection order:
tsconfig_path, if provided.tsconfig.app.json.tsconfig.json.Project reference handling:
Build an in-memory graph from the TypeScript Program / Language Service:
Suggested pure-core modules:
program.ts: discover/parse tsconfig and create the TypeScript Program.indexer.ts: walk source files, create graph nodes, and populate graph edges.search.ts: extract query terms and score candidate symbols.expand.ts: run bounded bidirectional graph traversal.render.ts: group selected symbols by file and emit capped source windows.index.ts: single exploreCode(ts, input) entrypoint used by the worker and tests.Follow the approach from ~/codegraph/NOTES.md:
test, spec, or vitest.max_files.Source-window caps:
max_files: 5.Add support for a new custom tag:
<dyad-explore-code>...</dyad-explore-code>
Renderer behavior:
dyad-explore-code.Use DyadError for expected user/project failures:
DyadErrorKind.Precondition: missing app-local TypeScript, missing tsconfig, no source files, unsupported project shape.DyadErrorKind.Validation: invalid arguments, invalid bounds, escaping tsconfig_path.DyadErrorKind.NotFound: unknown app_name, via existing app resolution.Unexpected compiler API failures should still surface as bugs with enough context for debugging, without dumping large source content into logs.
Unit tests:
enableCodeExplorer.explore_code by default and includes it when the experiment is enabled and the app is TypeScript-ready.tsconfig_path.DyadErrorKind.Precondition.dyad-explore-code.E2E test:
tsconfig.jsonexplore_code with a query such as login session auth service flow.explore_code when the setting is offenableCodeExplorerdyad-explore-code card is visiblenpm run build before the E2E test, then run the targeted Playwright spec.Add a manual benchmark command:
npm run benchmark:code-explorer
The benchmark should launch packaged Dyad programmatically with a real LLM provider and run paired baseline/experiment trials.
Repositories:
https://github.com/excalidraw/excalidrawhttps://github.com/mattermost/mattermost, app root webapp/channelshttps://github.com/calcom/cal.comhttps://github.com/supabase/supabaseTrial setup:
DYAD_PRO_KEY from .env and use it to authenticate Dyad Engine calls.enableCodeExplorer=false.enableCodeExplorer=true.Benchmark prompts:
Metrics:
Instrumentation:
DYAD_BENCHMARK_RUN_ID.benchmark-results/code-explorer/<run-id>/.summary.json and summary.md.Programmatic Dyad driver:
.env, require DYAD_PRO_KEY, and configure the app to use Dyad Engine for both arms.Repeat count:
--repeats=N for deeper studies when we want stronger evidence against real LLM latency, routing, and output-length variance.Success bar:
--repeats=N is greater than 1, use medians per task before aggregating.DYAD_PRO_KEY in .env, packaged build requirement, and whether repo dependencies should be installed before running.DyadErrorKind.Precondition.node_modules and .d.ts, scope benchmark apps to relevant subdirectories, cap graph expansion, and terminate idle workers.--repeats=N, pinned repo commits, fixed prompts, rubric checks, and paired baseline/experiment runs.DYAD_BENCHMARK_RUN_ID is set.enableCodeExplorer.explore_code local-agent tool and register it behind the experiment/readiness gate.dyad-explore-code.DYAD_BENCHMARK_RUN_ID.