explorations/ralph-wiggum-loop-integration.md
Status: IMPLEMENTED - The Agent Network with completion scorers IS the autonomous loop
The Ralph Wiggum loop is an autonomous agent execution pattern where an AI agent works persistently and iteratively until completion criteria are met. The core philosophy: "Let the agent fail repeatedly until it succeeds."
Completion checks are just MastraScorers that return 0 (not complete) or 1 (complete).
This unifies:
Same primitive, different contexts.
import { Agent } from '@mastra/core/agent';
import { Memory } from '@mastra/memory';
import { createScorer } from '@mastra/core/evals';
import { execSync } from 'child_process';
// Create a completion scorer
const testsScorer = createScorer({
id: 'tests',
description: 'Run unit tests to verify code works',
}).generateScore(async ({ run }) => {
try {
execSync('npm test', { stdio: 'pipe' });
return 1; // Tests passed
} catch {
return 0; // Tests failed
}
});
const agent = new Agent({
id: 'code-migrator',
instructions: 'You help migrate code between frameworks.',
model: openai('gpt-4o'),
memory: new Memory(),
agents: { coder: codingAgent },
// Default network options
defaultNetworkOptions: {
maxSteps: 20,
completion: {
scorers: [testsScorer],
strategy: 'all',
},
},
});
// Run network - uses default completion scorers
const result = await agent.network('Migrate all tests from Jest to Vitest');
for await (const chunk of result.fullStream) {
if (chunk.type === 'network-validation-end') {
console.log(`Completion: ${chunk.payload.passed ? '✅' : '❌'}`);
}
}
Scorers are created using createScorer from the evals module. For completion, they should return 0 or 1.
Scorers receive a run object with:
run.input - CompletionContext with all network staterun.output - The primitive's result (what we're evaluating)run.runId - The network run IDrun.requestContext - Custom context from the requestimport { createScorer } from '@mastra/core/evals';
// Simple scorer - run tests
const testsScorer = createScorer({
id: 'tests',
description: 'Run unit tests',
}).generateScore(async () => {
try {
execSync('npm test', { stdio: 'pipe' });
return 1;
} catch {
return 0;
}
});
// Context-aware scorer - access full network state
const progressScorer = createScorer({
id: 'progress',
description: 'Check progress',
}).generateScore(async ({ run }) => {
// run.input is CompletionContext
const ctx = run.input;
console.log(`Iteration: ${ctx.iteration}`);
console.log(`Task: ${ctx.originalTask}`);
console.log(`Primitive: ${ctx.selectedPrimitive.id}`);
console.log(`Result: ${run.output}`); // Same as ctx.primitiveResult
// Access messages history
const hasCodeOutput = ctx.messages.some(m => m.content?.includes?.('```'));
return hasCodeOutput ? 1 : 0;
});
// Use custom request context
const envScorer = createScorer({
id: 'env-check',
description: 'Environment-aware check',
}).generateScore(async ({ run }) => {
const isProd = run.requestContext?.env === 'production';
// Stricter checks in prod
return isProd ? runStrictChecks() : 1;
});
Use this when you want LLM evaluation alongside code scorers:
const taskCompleteScorer = createScorer({
id: 'task-complete',
description: 'LLM evaluates if task is complete',
judge: {
model: openai('gpt-4o-mini'),
instructions: 'You evaluate task completion.',
},
}).generateScore({
description: 'Evaluate if the task is complete',
createPrompt: ({ run }) => {
const ctx = run.input; // CompletionContext
return `
Original task: ${ctx.originalTask}
Latest result: ${ctx.primitiveResult}
Is this task complete? Return 1 if yes, 0 if no.
`;
},
});
// Use LLM scorer + code scorer together
completion: {
scorers: [taskCompleteScorer, testsScorer],
}
interface CompletionConfig {
// Scorers to run (return 0 or 1)
scorers?: MastraScorer[];
// 'all' = all must pass, 'any' = one must pass
strategy?: 'all' | 'any';
// Timeout for all scorers (ms)
timeout?: number;
// Run scorers in parallel
parallel?: boolean;
// Callback after scoring
onComplete?: (result: CompletionRunResult) => void;
}
Completion checks just answer: "Is this done?"
They do NOT generate the final result. The final result is the primitive's output.
| Config | What Runs |
|---|---|
No completion.scorers | Default LLM check |
completion.scorers: [...] | Your scorers |
All checks (default LLM or custom scorers) return:
{
complete: boolean, // Is the task done?
completionReason: string, // Why?
}
That's it. The network's result comes from the primitives, not from the checks.
┌──────────────────────────────────────────────────────────┐
│ Agent Network │
├────────────────────────────────────────────��─────────────┤
│ │
│ Primitives execute → produce result │
│ │ │
│ ▼ │
│ Completion check: "Is this done?" │
│ │ │
│ ┌─────┴─────┐ │
│ No Yes │
│ │ │ │
│ ▼ ▼ │
│ Loop Return primitive's result │
│ │
└──────────────────────────────────────────────────────────┘
// No scorers - uses built-in LLM evaluation
await agent.network('Build a landing page');
// Scorers replace the default LLM check entirely
await agent.network('Migrate to Vitest', {
completion: {
scorers: [testsScorer, buildScorer],
},
});
await agent.network('Build API', {
completion: {
scorers: [
testsScorer, // Code-based
qualityScorer, // LLM-based
apiScorer, // Code-based
],
strategy: 'all',
},
});
await agent.network('Fix the bug', {
completion: {
scorers: [testsScorer, manualApprovalScorer],
strategy: 'any', // Either tests pass OR manual approval
},
});
┌─────────────────────────────────────────────────────────────────┐
│ Agent Network Loop │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Routing Agent selects primitive (agent/workflow/tool) │
│ ↓ │
│ 2. Execute selected primitive │
│ ↓ │
│ 3. Run Completion Scorers │
│ (default LLM scorer if none configured) │
│ ↓ │
│ ┌───────┴───────┐ │
│ Score=0 Score=1 │
│ (not done) (complete) │
│ │ │ │
│ ▼ ▼ │
│ 4. Inject feedback 5. Complete! ✅ │
│ │ │
│ └──────────► Loop back to step 1 │
│ │
└─────────────────────────────────────────────────────────────────┘
for await (const chunk of result.fullStream) {
switch (chunk.type) {
case 'network-validation-start':
console.log(`Running ${chunk.payload.checksCount} scorers...`);
break;
case 'network-validation-end':
console.log(`Complete: ${chunk.payload.passed ? '✅' : '❌'}`);
for (const scorer of chunk.payload.results) {
console.log(` ${scorer.scorerName}: ${scorer.score}`);
}
break;
case 'routing-agent-start':
console.log(`Iteration ${chunk.payload.inputData.iteration}`);
break;
}
}
createScorer API for evals and completion