remoting/tools/magi-mode/SKILL_TEST.md
This document describes the protocol used to validate the MAGI sub-agents and prevent regressions in the protocol execution. It defines "unit" tests for each phase of the MAGI protocol.
To verify that sub-agents adhere to their mandates, follow the TONE MANDATE, produce valid schema-compliant output, and generate buildable code.
Rather than testing the entire protocol end-to-end (which is slow and prone to flakiness), we test each phase independently by providing mock inputs and verifying specific expected outputs.
To allow for real builds without risking side effects on the actual codebase,
all tests operate on static, checked-in dummy files located in:
remoting/tools/magi-mode/tests/testdata/
This directory contains its own BUILD.gn file to allow running real builds on
the test outputs!
To prevent automated tooling (like static analyzers) from flagging intentional flaws in test data, and to prevent humans from trying to fix them:
.magi.test (e.g., complex_uaf.cc.magi.test).complex_uaf.cc.magi.test) to
make it clear to humans what is being tested.bind_post_task_helper.cc) to prevent the agent from anchoring on the
filename.// MAGI: <comment> to describe
the flaw or reproduction steps for humans.BUILD.gn file MUST contain a GN action to copy
these files, rename them, and strip out the // MAGI: comments.BUILD.gn MUST be marked
testonly = true.Test cases are defined in magi_phase_[N]_tests.json conforming to
magi_test_schemas.json. Each test case includes:
files_created: List of files that should be created.content_patterns: Regex patterns that must match file content.buildable: Boolean indicating if the output must compile.valid_json: Boolean indicating if the output must be valid JSON.Run a specific test file:
python3 remoting/tools/magi-mode/run_magi_tests.py --tests \
remoting/tools/magi-mode/tests/magi_phase_1_tests.json
Run all tests using a shell loop:
for f in remoting/tools/magi-mode/tests/magi_*_tests.json; do \
python3 remoting/tools/magi-mode/run_magi_tests.py --tests "$f"; done
If the automated test runner fails or hangs (e.g., due to environment issues
with agentapi), an agent can manually execute the tests by interpreting the
JSON files:
base_inputs
and override_inputs from the test case.invoke_subagent tool with the constructed
prompt.expected_outputs in the test case (e.g., valid JSON,
specific content patterns).This ensures that tests can still be run even if the automation script is not fully functional in the current environment.