Back to Forgecode

Test Reasoning Serialization

.forge/skills/test-reasoning/SKILL.md

2.12.113.7 KB
Original Source

Test Reasoning Serialization

Validates that ReasoningConfig fields are correctly serialized into provider-specific JSON for OpenRouter, Anthropic, GitHub Copilot, and Codex.

Quick Start

Run all tests with the bundled script:

bash
./scripts/test-reasoning.sh

The script builds forge in debug mode, runs each provider/model combination, captures the outgoing HTTP request body via FORGE_DEBUG_REQUESTS, and asserts the correct JSON fields.

Running a Single Test Manually

bash
FORGE_DEBUG_REQUESTS="forge.request.json" \
FORGE_SESSION__PROVIDER_ID=<provider_id> \
FORGE_SESSION__MODEL_ID=<model_id> \
FORGE_REASONING__EFFORT=<effort> \
target/debug/forge -p "Hello!"

Then inspect .forge/forge.request.json for the expected fields.

Test Coverage

ProviderModelConfig fieldsExpected JSON field
open_routeropenai/o4-minieffort: none|minimal|low|medium|high|xhighreasoning.effort
open_routeropenai/o4-minimax_tokens: 4000reasoning.max_tokens
open_routeropenai/o4-minieffort: high + exclude: truereasoning.effort + .exclude
open_routeropenai/o4-minienabled: truereasoning.enabled
open_routeranthropic/claude-opus-4-5max_tokens: 4000reasoning.max_tokens
open_routermoonshotai/kimi-k2max_tokens: 4000reasoning.max_tokens
open_routermoonshotai/kimi-k2effort: highreasoning.effort
open_routerminimax/minimax-m2max_tokens: 4000reasoning.max_tokens
open_routerminimax/minimax-m2effort: highreasoning.effort
anthropicclaude-opus-4-6effort: low|medium|high|maxoutput_config.effort
anthropicclaude-3-7-sonnet-20250219enabled: true + max_tokens: 8000thinking.type + budget_tokens
github_copiloto4-minieffort: none|minimal|low|medium|high|xhighreasoning_effort (top-level)
codexgpt-5.1-codexeffort: none|minimal|low|medium|high|xhighreasoning.effort + .summary
codexgpt-5.1-codexeffort: medium + exclude: truereasoning.summary = "concise"
all providersone model eacheffort: invalidnon-zero exit, no request written

Tests for unconfigured providers are skipped automatically. Invalid-effort tests run regardless of credentials — the rejection happens at config parse time before any provider interaction.

References