fern/01-guide/05-baml-advanced/prompt-caching.mdx
Recall that an LLM request usually looks like this, where it sometimes has metadata in each message. In this case, Anthropic has a cache_control key.
curl https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"messages": [
{
"type": "text",
"text": "<the entire contents of Pride and Prejudice>",
"cache_control": {"type": "ephemeral"}
},
{
"role": "user",
"content": "Analyze the major themes in Pride and Prejudice."
}
]
}'
This is nearly the same as this BAML code, minus the cache_control metadata:
Let's add the cache-control metadata to each of our messages in BAML now.
There's just 2 steps:
function AnalyzeBook(book: string) -> string {
client<llm> AnthropicClient
prompt #"
{{ _.role("user") }}
{{ book }}
{{ _.role("user", cache_control={"type": "ephemeral"}) }}
Analyze the major themes in Pride and Prejudice.
"#
}
We have the "allowed_role_metadata" so that if you swap to other LLM clients, we don't accidentally forward the wrong metadata to the new provider API.
<Tip> Remember to switch from "Prompt Review" to "Raw cURL" in the VSCode Playground to see the exact request being sent! </Tip>