Back to Baml

Prompt Caching / Message Role Metadata

fern/01-guide/05-baml-advanced/prompt-caching.mdx

0.222.01.7 KB
Original Source

Recall that an LLM request usually looks like this, where it sometimes has metadata in each message. In this case, Anthropic has a cache_control key.

curl
curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 1024,
    "messages": [
       {
        "type": "text", 
        "text": "<the entire contents of Pride and Prejudice>",
        "cache_control": {"type": "ephemeral"}
      },
      {
        "role": "user",
        "content": "Analyze the major themes in Pride and Prejudice."
      }
    ]
  }'

This is nearly the same as this BAML code, minus the cache_control metadata:

Let's add the cache-control metadata to each of our messages in BAML now. There's just 2 steps:

<Steps> ### Allow role metadata and header in the client definition ```baml {5-8} main.baml client<llm> AnthropicClient { provider "anthropic" options { model "claude-sonnet-4-5-20250929" allowed_role_metadata ["cache_control"] } } ```

Add the metadata to the messages

baml
function AnalyzeBook(book: string) -> string {
  client<llm> AnthropicClient
  prompt #"
    {{ _.role("user") }}
    {{ book }}
    {{ _.role("user", cache_control={"type": "ephemeral"}) }}
    Analyze the major themes in Pride and Prejudice.
  "#
}
</Steps>

We have the "allowed_role_metadata" so that if you swap to other LLM clients, we don't accidentally forward the wrong metadata to the new provider API.

<Tip> Remember to switch from "Prompt Review" to "Raw cURL" in the VSCode Playground to see the exact request being sent! </Tip>