remoting/tools/magi-mode/SKILL_TEST_PLAN.md
This document outlines the strategy and methodology for testing the MAGI protocol implementation in the agent framework.
To ensure the protocol is both correct (logic) and effective (agent capability) while keeping testing costs and execution time reasonable, we use a layered approach:
To test the orchestration, state transitions, and consolidation logic (Phase 5) without the high cost and latency of LLM calls:
To test the actual capability of the expert personas to find real issues:
We have developed a set of test files in tests/testdata/ representing
common flaws to be used in both mocked and real agent tests:
complex_uaf.cc.magi.test, target
bind_post_task_helper.cc. Expected to detect base::Unretained usage.unsafe_threading.cc.magi.test, target
thread_safe_manager.cc. Expected to detect reverse lock acquisition.unsafe_casting.cc.magi.test, target
type_converter.cc. Expected to detect reinterpret_cast usage.win_handle_leak.cc.magi.test, target
file_manager_win.cc. Expected to detect missing CloseHandle.mac_retain_cycle.mm.magi.test, target
notification_delegate.mm. Expected to detect strong self capture in
block.linux_fd_leak.cc.magi.test, target
socket_handler_linux.cc. Expected to detect missing close() on early
return.tautological_assert_test.cc.magi.test, target
mock_helper_unittest.cc. Expected to detect trivial assert (true==true).To ensure transparency and verify that no critical steps (such as building and testing) are skipped during execution, the testing agent MUST generate a structured Test Execution Report artifact after running a suite of tests.
The report must include:
If the automated test runner fails or hangs, an agent can manually execute the tests in parallel to save time:
remoting/tools/magi-mode/.temp/ and copy testdata into them.invoke_subagent tool calls
concurrently (one for each test case) to run them in parallel.To prevent regressions in the protocol:
SKILL.md or the test schemas, the agent MUST run
the applicable agent unit tests and PRESUBMITS.