v3/docs/adr/ADR-114-dspy-ts-plugin.md
Status: Proposed (2026-05-11)
Date: 2026-05-11
Authors: claude (drafted with rUv)
Related: ADR-026 (3-tier model routing) · ADR-112 (MCP tool discoverability) · ADR-098 (plugin capability sync + token/performance/intelligence optimization) · ADR-G008 (optimizer promotion rule — "win twice to promote") · ADR-G009 (headless testing harness — Claude Code as evaluator) · RuVector / ReasoningBank intelligence pipeline (RETRIEVE → JUDGE → DISTILL → CONSOLIDATE) · dspy.ts (npm)
Supersedes: nothing
Ruflo today learns between tasks: the RuVector / ReasoningBank pipeline retrieves prior patterns, judges outcomes, distills learnings via LoRA, and consolidates with EWC++. Agent definitions, hook routing tables, and the model-routing tiers (ADR-026) are largely static — the prompts an agent runs with are hand-authored markdown, and there is no mechanism to optimize a prompt against a measurable objective.
dspy.ts ("DS.js — Declarative Self-learning JavaScript", ruvnet/dspy.ts, 248★, MIT, TypeScript) is the TypeScript port of Stanford's DSPy. Its model:
input → output contracts (e.g. { question: string } → { answer: string }).Predict, ChainOfThought, ReAct (with reflexion), Retrieve, composed into pipelines.BootstrapFewShot, MIPROv2, GEPA: given a metric and a small trainset, compile() tunes the prompt text and few-shot demonstrations automatically — no hand-crafted prompt strings.compile() (or a second agent run) warm-starts from what the first one learned.The key observation: dspy.ts is built on the same substrate ruflo already ships (agentdb, ReasoningBank, HNSW, RaBitQ). Integrating it is not bolting on a foreign dependency with its own storage and lifecycle — it is wiring a complementary capability onto infrastructure that is already present.
| Capability | Today in ruflo | With a dspy plugin |
|---|---|---|
| Optimize a prompt against a metric | — (prompts are hand-authored) | BootstrapFewShot / MIPROv2 / GEPA compile() |
| Typed LM-program contracts | implicit (free-text agent prompts) | Signature (input/output types, validated) |
| Few-shot demo selection | manual / ad-hoc | bootstrapped from a trainset, persisted |
| Reusable "compiled" agent behaviours | — | a compiled module is an artifact you can store, version, and re-load |
| Reflexion loops (ReAct + self-critique) | partial (intelligence JUDGE step) | first-class ReAct module with reflexion |
dspy.ts and ruflo's intelligence system both lean on AgentDB + ReasoningBank, and both are "self-learning". Without a sharp boundary, this becomes two systems doing fuzzily-similar things, confusing both Claude (which tool to call — see ADR-112) and contributors (where does this logic live?).
The boundary we draw:
compile() deliberately).A DSPy-compiled module can write into ReasoningBank (its trials are patterns), and ruflo intelligence can retrieve those — but they are not the same loop, and the plugin must not re-implement the intelligence pipeline.
Adopt dspy.ts as an optional Ruflo plugin — @claude-flow/plugin-dspy — that wraps the published dspy.ts npm package and exposes declarative prompt-program optimization through MCP tools, a slash skill, and an optional agent-loop hook. It shares ruflo's AgentDB instance rather than standing up its own store.
MCP tools (descriptions to follow the ADR-112 "use this over native when?" rule):
| Tool | Purpose | Native/internal overlap to disambiguate |
|---|---|---|
dspy_signature_define | Register a typed input → output signature (persisted to AgentDB under a dspy/signatures namespace). | none |
dspy_module_build | Wrap a signature in a module (Predict / ChainOfThought / ReAct / Retrieve / pipeline). | none |
dspy_compile | Run an optimizer (BootstrapFewShot / MIPROv2 / GEPA) against a metric + trainset; persist trials + the optimized module. | vs. Task / hand-editing an agent prompt — use dspy_compile when you have a metric and examples and want the prompt tuned for you. |
dspy_run | Execute a (compiled or raw) module against an input. | vs. agent_execute — use dspy_run for a single typed LM call inside a known signature; use agent_execute for open-ended agent work. |
dspy_module_load / dspy_module_list | Re-load a previously compiled module by id; list compiled artifacts. | vs. memory_retrieve — these return executable modules, not raw text. |
dspy_eval | Score a module against a holdout set (regression check for a compiled artifact). | none |
Skill: /dspy — guided workflow: define a signature → pick a module → supply a metric + a few examples → compile() → report the optimized module id + before/after metric. Mirrors the dspy.ts quick-start.
Hook (opt-in, default off): an AgentPromptCompile hook that, when an agent definition opts in (dspy: { signature, metric, trainset } in its frontmatter), runs dspy_compile once and caches the optimized prompt — so agents can be born optimized. This is gated and off by default because it adds a real cost (LM calls during compile) and must not surprise users.
Storage: the plugin receives ruflo's AgentDB handle via the plugin context. It uses dedicated namespaces (dspy/signatures, dspy/modules, dspy/trials) so its data is inspectable and prunable independently, and so the agentdb-curator consolidation pipeline can see DSPy trials as candidate patterns.
@claude-flow/cli and the core packages must build and run with the plugin absent (the runtime pattern is await import('@claude-flow/plugin-dspy').catch(() => null) — and per ADR-114's own guard, the published package.json must not declare it as a hard dep of anything).dspy.ts's configureLM is bridged to ruflo's existing provider config (@claude-flow/providers) so there is one place to set keys/models.scripts/audit-plugin-packages.mjs)dspy.ts as a normal dependencies entry (it is published, so this is fine) and pins it to a ^-range against a published version.@claude-flow/* references in the plugin's package.json are peerDependenciesMeta.optional with >=X.Y.Z-0 ranges (so prerelease publishes resolve).exports / main / module must point at files the build actually emits — the plugin-package-audit CI job (check D) will enforce this.discovery.ts demoPluginRegistry fallback kept in sync.dspy.ts is AgentDB-native, so there is no impedance mismatch on storage, caching, or learning.scripts/audit-plugin-packages.mjs lands is a clean test of whether the guard actually prevents the #1902/#1903/#1904 class for greenfield plugins.dspy_run vs agent_execute, dspy_module_load vs memory_retrieve (ADR-112).agentdb line (if dspy.ts and ruflo pin different agentdb majors, the shared-handle story breaks). Mitigated by both being first-party.compile() has real cost. Optimizers make many LM calls. The opt-in hook and the /dspy skill must surface estimated cost before running, and compile() results must be cached aggressively (it already supports a fuzzy AgentDB cache via CachingLM).dspy.ts — https://github.com/ruvnet/dspy.ts · https://www.npmjs.com/package/dspy.tsdspy plugin advertises its capabilities; complements rather than overlaps the intelligence optimizer)dspy_* tool description must comply)compile()'d module is allowed to replace the current one; reuse this rule rather than inventing a DSPy-specific one)dspy_eval / compile() metrics that need a judging LM)scripts/audit-plugin-packages.mjs — plugin packaging guard the @claude-flow/plugin-dspy package must pass (#1902/#1903/#1904)Causal edges (AgentDB):
ADR-114 depends-on ADR-G008(promotion semantics),ADR-114 depends-on ADR-G009(eval harness),ADR-114 relates-to ADR-098(capability registration),ADR-114 relates-to ADR-026/ADR-112(must comply).