sdk/docs/caching.md
When building applications on the GSD SDK, system prompts that include workflow instructions (executor prompts, planner context, verification rules) are large and stable across requests. Prompt caching avoids re-processing these on every API call.
Use cache_control with a 1-hour TTL on system prompts that include GSD workflow content:
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
system: [
{
type: 'text',
text: executorPrompt, // GSD workflow instructions — large, stable across requests
cache_control: { type: 'ephemeral', ttl: '1h' },
},
],
messages,
});
GSD workflows involve human review pauses between phases — discussing results, checking verification output, deciding next steps. The default 5-minute TTL expires during these pauses, forcing full re-processing of the system prompt on the next request.
With a 1-hour TTL:
| Prompt | Cache? | Reason |
|---|---|---|
| Executor system prompt | Yes | Large (~10K tokens), identical across tasks in a phase |
| Planner system prompt | Yes | Large, stable within a planning session |
| Verifier system prompt | Yes | Large, stable within a verification session |
| User/task-specific content | No | Changes per request |
In session-runner.ts, the systemPrompt.append field carries the executor/planner prompt. When using the Claude API directly (outside the Agent SDK's query() helper), wrap this content with cache_control:
// In runPlanSession / runPhaseStepSession, the systemPrompt is:
systemPrompt: {
type: 'preset',
preset: 'claude_code',
append: executorPrompt, // <-- this is the content to cache
}
// When calling the API directly, convert to:
system: [
{
type: 'text',
text: executorPrompt,
cache_control: { type: 'ephemeral', ttl: '1h' },
},
]