fern/01-guide/06-prompt-engineering/token-optimization.mdx
When working with LLMs, token usage directly impacts both cost and latency. Different serialization formats can affect how many tokens are used to represent your data—but the optimal format depends on your specific use case and LLM.
Every optimization has trade-offs. Reducing token count doesn't automatically improve accuracy, and different LLMs may respond differently to different formats. You should:
What works for one use case may not work for another.
BAML's format filter lets you experiment with different serializations:
{{ data|format(type="json") }} {# Standard JSON #}
{{ data|format(type="yaml") }} {# YAML format #}
{{ data|format(type="toon") }} {# TOON format #}
TOON (Token-Oriented Object Notation) is a compact format that uses:
What TOON is good for:
When TOON may NOT help:
Learn more: TOON specification and benchmarks
TOON's efficiency comes from its tabular format for arrays. Your data's "tabular eligibility" affects how much TOON can help:
Example - High eligibility:
{ "users": [
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"}
]}
All users have identical fields → 100% tabular → TOON helps
Example - Low eligibility:
{ "config": {
"server": { "host": "localhost", "port": 8080 },
"database": { "name": "prod", "pool": { "min": 5, "max": 20 }}
}}
Deeply nested with no arrays of uniform objects → 0% tabular → JSON-compact likely better
Here's how you might test different formats for a product analysis task:
class Product {
id int
name string
price float
in_stock bool
}
function AnalyzeProducts(products: Product[]) -> string {
client GPT4
prompt #"
Analyze these products and provide insights:
{{ products }}
Focus on pricing trends and inventory status.
"#
}
When you pass products to this function, they're serialized as JSON:
[
{ "id": 1, "name": "Widget", "price": 9.99, "in_stock": true },
{ "id": 2, "name": "Gadget", "price": 19.99, "in_stock": false }
]
To test if TOON works for your use case, try the format(type="toon") filter:
function AnalyzeProducts(products: Product[]) -> string {
client GPT4
prompt #"
Analyze these products and provide insights:
{{ products|format(type="toon") }}
Focus on pricing trends and inventory status.
"#
}
The same data serialized as TOON:
[2]{id,name,price,in_stock}:
1,Widget,9.99,true
2,Gadget,19.99,false
Next steps: Test with your actual prompts and measure both token usage and accuracy.
Control spacing for better readability:
{{ products|format(type="toon", indent=4) }}
Choose the delimiter that works best for your data:
{# Comma-separated (default) #}
{{ products|format(type="toon", delimiter="comma") }}
{# Tab-separated #}
{{ products|format(type="toon", delimiter="tab") }}
{# Pipe-separated #}
{{ products|format(type="toon", delimiter="pipe") }}
Delimiter trade-offs:
\t): Often tokenizes more efficiently than commas; tabs rarely appear in data (less quote-escaping needed); but some editors/terminals may display tabs inconsistently|): Middle ground between comma and tab; explicit visual separatorTest different delimiters with your actual data - the best choice depends on your content.
Add length indicators for clarity:
{{ products|format(type="toon", length_marker="#") }}
Output:
#2[2]{id,name,price,in_stock}:
1,Widget,9.99,true
2,Gadget,19.99,false
Here's a complete example analyzing financial transactions:
class Transaction {
id string
date string
amount float
category string
merchant string
status string
}
function AnalyzeTransactions(
transactions: Transaction[],
question: string
) -> string {
client GPT4
prompt #"
{{ _.role("system") }}
You are a financial analyst. Answer questions about transaction data.
{{ _.role("user") }}
Transaction data:
{{ transactions|format(type="toon", delimiter="pipe") }}
Question: {{ question }}
"#
}
test AnalyzeSpending {
functions [AnalyzeTransactions]
args {
transactions [
{
id: "tx_001",
date: "2025-01-15",
amount: 45.99,
category: "Dining",
merchant: "Coffee Shop",
status: "completed"
},
{
id: "tx_002",
date: "2025-01-16",
amount: 120.00,
category: "Shopping",
merchant: "Electronics Store",
status: "completed"
}
]
question "What's my largest expense category this week?"
}
}
Token reduction is only valuable if accuracy and reliability are maintained. Consider:
According to TOON's benchmarks:
Critical: Lost accuracy, increased debugging time, or degraded user experience typically cost far more than token savings. Always measure end-to-end impact on your specific workload, not just token counts.
Always establish a baseline with a standard format first:
function AnalyzeData(data: Dataset[]) -> Analysis {
client GPT4
prompt #"
{{ _.role("user") }}
Data:
{{ data|format(type="json") }} // Start with JSON baseline
Provide analysis.
"#
}
Measure: Accuracy, token usage, latency, cost
Try different formats and compare results:
{# Experiment 1: YAML #}
{{ data|format(type="yaml") }}
{# Experiment 2: TOON #}
{{ data|format(type="toon") }}
{# Experiment 3: TOON with options #}
{{ data|format(type="toon", delimiter="pipe") }}
Measure: Do you maintain accuracy? How much do tokens reduce?
Different formats work better for different structures. From TOON benchmarks:
{# High tabular eligibility: uniform arrays #}
{{ products|format(type="toon") }} // Try TOON
{{ products|format(type="json") }} // Compare vs JSON
{# Low tabular eligibility: deeply nested config #}
{{ config|format(type="json") }} // JSON-compact often better
{{ config|format(type="yaml") }} // Or try YAML
{# Medium eligibility: mixed structures #}
{{ events|format(type="json") }} // Test multiple formats
{{ events|format(type="toon") }} // Results vary
Key insight: For pure flat tables, CSV is more compact than TOON. For deeply nested data, compact JSON may win. TOON's sweet spot is uniform arrays of objects with multiple fields.
Different models may respond differently to format changes. Test with the specific LLM you're using.
Tip from TOON documentation: When using TOON, show the format instead of describing it. Models parse the structure naturally from examples - the indentation and headers are usually self-documenting.
Important: Lower token count doesn't guarantee lower latency. Some models may process familiar formats (like JSON) faster even if they use more tokens. Measure end-to-end response time.