Back to Get Shit Done

Provider rate-limit signals across executor runtimes

docs/research/provider-rate-limit-signals.md

1.42.16.4 KB
Original Source

Provider rate-limit signals across executor runtimes

Status: research note — informs #3095 reactive classification and points at the proactive path forward.

GSD dispatches executor subagents into one of four host runtimes today: Claude Code, GitHub Copilot CLI, OpenAI Codex CLI, and Google Gemini CLI. Each provider exposes rate-limit information at three different layers — pre-warning, post-mortem error body, and underlying HTTP transport — but the host runtime (the CLI that wraps the provider for us) gates how much of that surfaces to GSD's orchestrator.

The reactive classifier shipped with #3095 (agent.classify-failure) parses post-mortem error bodies. This note records the proactive signals that exist at the provider layer but are not yet surfaced to orchestrators by the host runtimes — i.e. the forward path once host runtimes expose them to hooks.

Anthropic / Claude Code

LayerSignalAvailable to GSD today?
HTTP headersanthropic-ratelimit-requests-remaining, anthropic-ratelimit-tokens-remaining, retry-after on 429No — Claude Code does not forward these to hooks (#33820, #22407)
Agent SDKRateLimitEvent with status transitions: allowedallowed_warningrejectedOnly when callers use the Agent SDK directly
Plan usageMax plan session / weekly usageNo — Claude Code SDK does not expose Max-plan limits (#32796)
Error body"You've hit your org's monthly usage limit", 429, rate_limit_errorYes — parsed by agent.classify-failure

GitHub Copilot CLI

LayerSignalAvailable to GSD today?
Pre-warningCLI displays a warning when approaching a limit (per GitHub Copilot usage-limits docs)Visible in subprocess stdout, not as a structured signal
Error body"hit a rate limit", "exceeded your Copilot token usage", rate_limited, user_weekly_rate_limitedYes — parsed by agent.classify-failure

OpenAI Codex CLI

LayerSignalAvailable to GSD today?
HTTP headersx-ratelimit-remaining-requests, x-ratelimit-remaining-tokensNo — Codex CLI does not forward these to hooks
Error body429, usage_limit_reached, "exceeded your current quota", Too Many Requests (openai/codex#9135)Yes — parsed by agent.classify-failure

Google Gemini CLI

LayerSignalAvailable to GSD today?
Error bodyRESOURCE_EXHAUSTED, "exceeded your current quota", 429Yes — parsed by agent.classify-failure
Pre-warningNone documented at the CLI layern/a

Forward path

When the host runtimes (Claude Code, Copilot CLI, Codex CLI) start exposing rate-limit headers and SDK events to hooks — which is the active ask in anthropics/claude-code#33820, #22407, and #32796 — GSD should:

  1. Read requests-remaining / tokens-remaining in a PreToolUse / SessionStart hook and surface a soft warning to the user before dispatching a new wave when the values fall below a configurable threshold.
  2. Treat RateLimitEvent.status == "allowed_warning" (Anthropic Agent SDK) as a checkpoint signal in long-running executors — emit a partial SUMMARY and let the orchestrator pick up after reset.
  3. Combine the proactive headers with executor.stall_threshold_minutes (#3329) so the orchestrator does not wait the full stall interval when the runtime has already signalled rejected.

Until then, agent.classify-failure is the actionable boundary: post-mortem error-body sentinels, with a quota-distinct recovery prompt in execute-phase step 7.

Sources