fern/01-guide/why-baml.mdx
Let's say you want to extract structured data from resumes. It starts simple enough...
But first, let's see where we're going with this story:
<iframe width="640" height="360" src="https://www.youtube.com/embed/S9jxdVLFDJU" frameborder="0" allowfullscreen></iframe>BAML: What it is and how it helps - see the full developer experience
You begin with a basic LLM call to extract a name and skills:
import openai
def extract_resume(text):
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Extract name and skills from: {text}"}]
)
return response.choices[0].message.content
This works... sometimes. But you need structured data, not free text.
So you try JSON mode and add Pydantic for validation:
from pydantic import BaseModel
import json
class Resume(BaseModel):
name: str
skills: list[str]
def extract_resume(text):
prompt = f"""Extract resume data as JSON:
{text}
Return JSON with fields: name (string), skills (array of strings)"""
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
data = json.loads(response.choices[0].message.content)
return Resume(**data)
Better! But now you need more fields. You add education, experience, and location:
class Education(BaseModel):
school: str
degree: str
year: int
class Resume(BaseModel):
name: str
skills: list[str]
education: list[Education]
location: str
years_experience: int
The prompt gets longer and more complex. But wait - how do you test this without burning tokens?
Every test costs money and takes time:
# This burns tokens every time you run tests!
def test_resume_extraction():
test_resume = "John Doe, Python expert, MIT 2020..."
result = extract_resume(test_resume) # API call = $$$
assert result.name == "John Doe"
You try mocking, but then you're not testing your actual extraction logic. Your prompt could be completely broken and tests would still pass.
Real resumes break your extraction. The LLM returns malformed JSON:
{
"name": "John Doe",
"skills": ["Python", "JavaScript"
// Missing closing bracket!
You add retry logic, JSON fixing, error handling:
import re
import time
def extract_resume(text, max_retries=3):
for attempt in range(max_retries):
try:
response = openai.chat.completions.create(...)
content = response.choices[0].message.content
# Try to fix common JSON issues
content = fix_json(content)
data = json.loads(content)
return Resume(**data)
except (json.JSONDecodeError, ValidationError) as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
def fix_json(content):
# Remove text before/after JSON
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
content = json_match.group(0)
# Fix common issues
content = content.replace(',}', '}')
content = content.replace(',]', ']')
# ... more fixes
return content
Your simple extraction function is now 50+ lines of infrastructure code.
Your company wants to use Claude for some tasks (better reasoning) and GPT-4-mini for others (cost savings):
def extract_resume(text, provider="openai", model="gpt-4o"):
if provider == "openai":
import openai
client = openai.OpenAI()
response = client.chat.completions.create(model=model, ...)
elif provider == "anthropic":
import anthropic
client = anthropic.Anthropic()
# Different API! Need to rewrite everything
response = client.messages.create(model=model, ...)
# ... handle different response formats
Each provider has different APIs, different response formats, different capabilities. Your code becomes a mess of if/else statements.
Your extraction fails on certain resumes. You need to debug, but what was actually sent to the LLM?
# What prompt was generated? How many tokens did it use?
# Why did this specific resume fail?
# How do I optimize for cost?
# You can't easily see:
# - The exact prompt that was sent
# - How the schema was formatted
# - Token usage breakdown
# - Why specific fields were missed
You start adding logging, token counting, prompt inspection tools...
Now you need to classify seniority levels:
from enum import Enum
class SeniorityLevel(str, Enum):
JUNIOR = "junior"
MID = "mid"
SENIOR = "senior"
STAFF = "staff"
class Resume(BaseModel):
name: str
skills: list[str]
education: list[Education]
seniority: SeniorityLevel
But the LLM doesn't know what these levels mean! You update the prompt:
prompt = f"""Extract resume data as JSON:
Seniority levels:
- junior: 0-2 years experience
- mid: 2-5 years experience
- senior: 5-10 years experience
- staff: 10+ years experience
{text}
Return JSON with fields: name, skills, education, seniority..."""
Your prompt is getting huge and your business logic is scattered between code and strings.
In production, you need:
Your simple extraction function becomes a complex service:
class ResumeExtractor:
def __init__(self):
self.primary_client = openai.OpenAI()
self.fallback_client = anthropic.Anthropic()
self.token_tracker = TokenTracker()
self.error_monitor = ErrorMonitor()
async def extract_with_fallback(self, text):
try:
return await self._extract_openai(text)
except RateLimitError:
return await self._extract_anthropic(text)
except Exception as e:
self.error_monitor.log(e)
raise
def _extract_openai(self, text):
# 50+ lines of OpenAI-specific logic
pass
def _extract_anthropic(self, text):
# 50+ lines of Anthropic-specific logic
pass
What if you could go back to something simple, but keep all the power?
class Education {
school string
degree string
year int
}
enum SeniorityLevel {
JUNIOR @description("0-2 years of experience")
MID @description("2-5 years of experience")
SENIOR @description("5-10 years of experience")
STAFF @description("10+ years of experience, technical leadership")
}
class Resume {
name string
skills string[]
education Education[]
seniority SeniorityLevel
}
function ExtractResume(resume_text: string) -> Resume {
client GPT4
prompt #"
Extract information from this resume.
Resume:
---
{{ resume_text }}
---
{{ ctx.output_format }}
"#
}
Look what you get immediately:
BAML playground showing successful resume extraction with clear prompts and structured output
Test in VSCode playground without API calls or token costs:
Build up a library of test cases that run instantly
client<llm> GPT4 {
provider openai
options { model "gpt-4o" }
}
client<llm> Claude {
provider anthropic
options { model "claude-3-opus-20240229" }
}
client<llm> GPT4Mini {
provider openai
options { model "gpt-4o-mini" }
}
// Same function, any model - just change the client
function ExtractResume(resume_text: string) -> Resume {
client GPT4 // Switch to Claude or GPT4Mini with one line
prompt #"..."#
}
BAML's breakthrough innovation follows Postel's Law: "Be conservative in what you do, be liberal in what you accept from others."
Instead of rejecting imperfect outputs, SAP actively transforms them to match your schema using custom edit distance algorithms.
<Tabs> <Tab title="Performance Comparison">SAP vs Other Approaches:
| Model | Function Calling | Python AST Parser | SAP |
|---|---|---|---|
| gpt-3.5-turbo | 87.5% | 75.8% | 92% |
| gpt-4o | 87.4% | 82.1% | 93% |
| claude-3-haiku | 57.3% | 82.6% | 91.7% |
Key insight: SAP + GPT-3.5 turbo beats GPT-4o + structured outputs, saving you money while improving accuracy.
</Tab> <Tab title="Error Correction">What SAP fixes automatically:
Raw LLM Output:
// The model often outputs this mess:
{
"name": John Doe, // Missing quotes
"skills": ["Python", "JavaScript",], // Trailing comma
"experience": 3.5 years, // Invalid type
"bio": "I'm a \"developer\"", // Unescaped quotes
/* some comment */ // JSON comments
"confidence": 9/10 // Fraction instead of decimal
}
SAP Transforms to:
{
"name": "John Doe",
"skills": ["Python", "JavaScript"],
"experience": 3.5,
"bio": "I'm a \"developer\"",
"confidence": 0.9
}
Error correction techniques:
Traditional JSON Schema (verbose):
{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The person's full name"
},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of technical skills"
},
"experience": {
"type": "number",
"description": "Years of experience"
}
},
"required": ["name", "skills"]
}
Token count: ~180 tokens
BAML Schema (optimized):
class Resume {
name string @description("The person's full name")
skills string[] @description("List of technical skills")
experience float? @description("Years of experience")
}
Token count: ~35 tokens
80% token reduction while being clearer to the model!
</Tab> <Tab title="Chain-of-Thought">Traditional approach - Choose reasoning OR structure:
# Either get reasoning (unstructured)
reasoning = llm.complete("Analyze this resume and explain your thinking...")
# OR get structure (no reasoning)
resume = llm.structured_output(resume_schema, text)
BAML's SAP - Get both in one call:
class ResumeAnalysis {
reasoning string @description("Step-by-step analysis")
name string
skills string[]
seniority_level SeniorityLevel
confidence_score float
}
function AnalyzeResume(text: string) -> ResumeAnalysis {
client GPT4
prompt #"
Analyze this resume step by step, then extract structured data.
Resume: {{ text }}
{{ ctx.output_format }}
"#
}
Result: Chain-of-thought reasoning AND structured output in a single API call.
</Tab> </Tabs>client<llm> RobustGPT4 {
provider openai
options { model "gpt-4o" }
retry_policy {
max_retries 3
strategy exponential_backoff
}
}
client<llm> SmartFallback {
provider fallback
options {
clients ["GPT4", "Claude", "GPT4Mini"]
}
}
from baml_client import baml as b
# Fully typed, works in Python, TypeScript, Java, Go
resume = await b.ExtractResume(resume_text)
print(resume.seniority) # Type: SeniorityLevel
BAML generates fully typed clients for all languages automatically
See how changes instantly update the prompt:
Change your types → Prompt automatically updates → See the difference immediately
BAML's semantic streaming lets you build real UIs with loading bars and type-safe implementations:
class BlogPost {
title string @stream.done @stream.not_null
content string @stream.with_state
}
What this enables:
See semantic streaming in action - structured data streaming with loading states
You started with: A simple LLM call You ended up with: Hundreds of lines of infrastructure code
With BAML, you get:
BAML is what LLM development should have been from the start. Ready to see the difference? Get started with BAML.