workshops/2025-07-16/workshop_final.ipynb
Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML.
Let's start with a basic Python setup and a hello world program.
This guide will walk you through building agents in Python with BAML.
We'll start simple with a hello world program and gradually build up to a full agent.
For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.
Here's our simple hello world program:
# ./walkthrough/00-main.py
def hello():
print('hello, world!')
def main():
hello()
Let's run it to verify it works:
main()
Now let's add BAML and create our first agent with a CLI interface.
In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.
BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by BoundaryML (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.
BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.
BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.
First, let's set up BAML support in our notebook.
Don't worry too much about this setup code - it will make sense later! For now, just know that:
get_baml_client() function will be used to interact with AI models!pip install baml-py==0.202.0 pydantic
import subprocess
import os
# Try to import Google Colab userdata, but don't fail if not in Colab
try:
from google.colab import userdata
IN_COLAB = True
except ImportError:
IN_COLAB = False
def baml_generate():
try:
result = subprocess.run(
["baml-cli", "generate"],
check=True,
capture_output=True,
text=True
)
if result.stdout:
print("[baml-cli generate]\n", result.stdout)
if result.stderr:
print("[baml-cli generate]\n", result.stderr)
except subprocess.CalledProcessError as e:
msg = (
f"`baml-cli generate` failed with exit code {e.returncode}\n"
f"--- STDOUT ---\n{e.stdout}\n"
f"--- STDERR ---\n{e.stderr}"
)
raise RuntimeError(msg) from None
def get_baml_client():
"""
a bunch of fun jank to work around the google colab import cache
"""
# Set API key from Colab secrets or environment
if IN_COLAB:
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
elif 'OPENAI_API_KEY' not in os.environ:
print("Warning: OPENAI_API_KEY not set. Please set it in your environment.")
baml_generate()
# Force delete all baml_client modules from sys.modules
import sys
modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]
for module in modules_to_delete:
del sys.modules[module]
# Now import fresh
import baml_client
return baml_client.sync_client.b
!baml-cli init
!ls baml_src
Now let's create our agent that will use BAML to process user input.
First, we'll define the core agent logic:
# ./walkthrough/01-agent.py
import json
from typing import Dict, Any, List
# tool call or a respond to human tool
AgentResponse = Any # This will be the return type from b.DetermineNextStep
class Event:
def __init__(self, type: str, data: Any):
self.type = type
self.data = data
class Thread:
def __init__(self, events: List[Dict[str, Any]]):
self.events = events
def serialize_for_llm(self):
# can change this to whatever custom serialization you want to do, XML, etc
# e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105
return json.dumps(self.events)
# right now this just runs one turn with the LLM, but
# we'll update this function to handle all the agent logic
def agent_loop(thread: Thread) -> AgentResponse:
b = get_baml_client() # This will be defined by the BAML setup
next_step = b.DetermineNextStep(thread.serialize_for_llm())
return next_step
Next, we need to define the BAML function that our agent will use.
BAML files define:
DoneForNow below)This BAML file defines what our agent can do:
!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml
!ls baml_src
Now let's create our main function that accepts a message parameter:
# ./walkthrough/01-main.py
def main(message="hello from the notebook!"):
# Create a new thread with the user's message as the initial event
thread = Thread([{"type": "user_input", "data": message}])
# Run the agent loop with the thread
result = agent_loop(thread)
print(result)
Let's test our agent! Try calling main() with different messages:
main("What's the weather like?")main("Tell me a joke")main("How are you doing today?")in this case, we'll use the baml_generate function to generate the pydantic and python bindings from our baml source, but in the future we'll skip this step as it is done automatically by the get_baml_client() function
baml_generate()
main("Hello from the Python notebook!")
Let's add some calculator tools to our agent.
Let's start by adding a tool definition for the calculator.
These are simple structured outputs that we'll ask the model to return as a "next step" in the agentic loop.
!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml
!ls baml_src
Now, let's update the agent's DetermineNextStep method to expose the calculator tools as potential next steps.
!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml
Now let's update our main function to show the tool call:
# ./walkthrough/02-main.py
def main(message="hello from the notebook!"):
# Create a new thread with the user's message
thread = Thread([{"type": "user_input", "data": message}])
# Get BAML client
b = get_baml_client()
# Get the next step from the agent - just show the tool call
next_step = b.DetermineNextStep(thread.serialize_for_llm())
# Print the raw response to show the tool call
print(next_step)
Let's try out the calculator! The agent should recognize that you want to perform a calculation and return the appropriate tool call instead of just a message.
main("can you add 3 and 4")
Now let's add a real agentic loop that can run the tools and get a final answer from the LLM.
In this chapter, we'll enhance our agent to process tool calls in a loop. This means:
Let's update our agent to handle tool calls properly:
# ./walkthrough/03-agent.py
import json
from typing import Dict, Any, List
class Thread:
def __init__(self, events: List[Dict[str, Any]]):
self.events = events
def serialize_for_llm(self):
# can change this to whatever custom serialization you want to do, XML, etc
# e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105
return json.dumps(self.events)
def agent_loop(thread: Thread) -> str:
b = get_baml_client()
while True:
next_step = b.DetermineNextStep(thread.serialize_for_llm())
print("nextStep", next_step)
if next_step.intent == "done_for_now":
# response to human, return the next step object
return next_step.message
elif next_step.intent == "add":
thread.events.append({
"type": "tool_call",
"data": next_step.__dict__
})
result = next_step.a + next_step.b
print("tool_response", result)
thread.events.append({
"type": "tool_response",
"data": result
})
continue
else:
raise ValueError(f"Unknown intent: {next_step.intent}")
Now let's update our main function to use the new agent loop:
# ./walkthrough/03-main.py
def main(message="hello from the notebook!"):
# Create a new thread with the user's message
thread = Thread([{"type": "user_input", "data": message}])
# Run the agent loop with full tool handling
result = agent_loop(thread)
# Print the final response
print(f"\nFinal response: {result}")
Let's try it out! The agent should now call the tool and return the calculated result:
main("can you add 3 and 4")
You should see the agent:
For more complex calculations, we need to handle all calculator operations. Let's add support for subtract, multiply, and divide:
# ./walkthrough/03b-agent.py
import json
from typing import Dict, Any, List, Union
class Thread:
def __init__(self, events: List[Dict[str, Any]]):
self.events = events
def serialize_for_llm(self):
# can change this to whatever custom serialization you want to do, XML, etc
# e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105
return json.dumps(self.events)
def handle_next_step(next_step, thread: Thread) -> Thread:
result: float
if next_step.intent == "add":
result = next_step.a + next_step.b
print("tool_response", result)
thread.events.append({
"type": "tool_response",
"data": result
})
return thread
elif next_step.intent == "subtract":
result = next_step.a - next_step.b
print("tool_response", result)
thread.events.append({
"type": "tool_response",
"data": result
})
return thread
elif next_step.intent == "multiply":
result = next_step.a * next_step.b
print("tool_response", result)
thread.events.append({
"type": "tool_response",
"data": result
})
return thread
elif next_step.intent == "divide":
result = next_step.a / next_step.b
print("tool_response", result)
thread.events.append({
"type": "tool_response",
"data": result
})
return thread
def agent_loop(thread: Thread) -> str:
b = get_baml_client()
while True:
next_step = b.DetermineNextStep(thread.serialize_for_llm())
print("nextStep", next_step)
thread.events.append({
"type": "tool_call",
"data": next_step.__dict__
})
if next_step.intent == "done_for_now":
# response to human, return the next step object
return next_step.message
elif next_step.intent in ["add", "subtract", "multiply", "divide"]:
thread = handle_next_step(next_step, thread)
Now let's test subtraction:
main("can you subtract 3 from 4")
Test multiplication:
main("can you multiply 3 and 4")
Finally, let's test a complex multi-step calculation:
main("can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result")
Congratulations! You've taken your first step into hand-rolling an agent loop.
Key concepts you've learned:
From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.
Let's add some tests to our BAML agent.
In this chapter, we'll learn about BAML testing - a powerful feature that helps ensure your agents behave correctly.
Let's start with a simple test that checks the agent's ability to handle basic interactions:
!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04-agent.baml && cat baml_src/agent.baml
Run the tests to see them in action:
!baml-cli test
Now let's improve the tests with assertions! Assertions let you verify specific properties of the agent's output.
Assertions use the @@assert directive:
@@assert(name, {{condition}})
name: A descriptive name for the assertioncondition: A boolean expression using this to access the output!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04b-agent.baml && cat baml_src/agent.baml
Run the tests again to see assertions in action:
!baml-cli test
Finally, let's add more complex test cases that test multi-step conversations.
These tests simulate an entire conversation flow, including:
!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04c-agent.baml && cat baml_src/agent.baml
Run the comprehensive test suite:
!baml-cli test
With these tests in place, you can confidently modify your agent knowing that core functionality is protected by automated tests!
In this section, we'll add support for multiple tools that serve to contact humans.
So far, our agent only returns a final answer with "done_for_now". But what if the agent needs clarification?
Let's add a new tool that allows the agent to request more information from the user.
First, let's update our BAML file to include a ClarificationRequest tool:
!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/05-agent.baml && cat baml_src/agent.baml
Now let's update our agent to handle clarification requests:
# ./walkthrough/05-agent.py
# Agent implementation with clarification support
import json
def agent_loop(thread, clarification_handler, max_iterations=3):
"""Run the agent loop until we get a final answer (max 3 iterations)."""
iteration_count = 0
while iteration_count < max_iterations:
iteration_count += 1
print(f"š Agent loop iteration {iteration_count}/{max_iterations}")
# Get the client
baml_client = get_baml_client()
# Serialize the thread
thread_json = json.dumps(thread.events, indent=2)
# Call the agent
result = baml_client.DetermineNextStep(thread_json)
# Check what type of result we got based on intent
if hasattr(result, 'intent'):
if result.intent == 'done_for_now':
return result.message
elif result.intent == 'request_more_information':
# Get clarification from the human
clarification = clarification_handler(result.message)
# Add the clarification to the thread
thread.events.append({
"type": "clarification_request",
"data": result.message
})
thread.events.append({
"type": "clarification_response",
"data": clarification
})
# Continue the loop with the clarification
elif result.intent in ['add', 'subtract', 'multiply', 'divide']:
# Execute the appropriate tool based on intent
if result.intent == 'add':
result_value = result.a + result.b
operation = f"add({result.a}, {result.b})"
elif result.intent == 'subtract':
result_value = result.a - result.b
operation = f"subtract({result.a}, {result.b})"
elif result.intent == 'multiply':
result_value = result.a * result.b
operation = f"multiply({result.a}, {result.b})"
elif result.intent == 'divide':
if result.b == 0:
result_value = "Error: Division by zero"
else:
result_value = result.a / result.b
operation = f"divide({result.a}, {result.b})"
print(f"š§ Calling tool: {operation} = {result_value}")
# Add the tool call and result to the thread
thread.events.append({
"type": "tool_call",
"data": {
"tool": "calculator",
"operation": operation,
"result": result_value
}
})
else:
return "Error: Unexpected result type"
# If we've reached max iterations without a final answer
return f"Agent reached maximum iterations ({max_iterations}) without completing the task."
class Thread:
"""Simple thread to track conversation history."""
def __init__(self, events):
self.events = events
Finally, let's create a main function that handles human interaction:
# ./walkthrough/05-main.py
def get_human_input(prompt):
"""Get input from human, handling both Colab and local environments."""
print(f"\nš¤ {prompt}")
if IN_COLAB:
# In Colab, use actual input
response = input("Your response: ")
else:
# In local testing, return a fixed response
response = "I meant to multiply 3 and 4"
print(f"š [Auto-response for testing]: {response}")
return response
def main(message="hello from the notebook!"):
# Function to handle clarification requests
def handle_clarification(question):
return get_human_input(f"The agent needs clarification: {question}")
# Create a new thread with the user's message
thread = Thread([{"type": "user_input", "data": message}])
print(f"š Starting agent with message: '{message}'")
# Run the agent loop
result = agent_loop(thread, handle_clarification)
# Print the final response
print(f"\nā
Final response: {result}")
Let's test with an ambiguous input that should trigger a clarification request:
main("can you multiply 3 and FD*(#F&&")
You should see:
When running in Google Colab, the input() function will create an interactive text box where you can type your response. Try different clarifications to see how the agent adapts!
In this section, we'll explore how to customize the prompt of the agent with reasoning steps.
This is core to factor 2 - own your prompts
Adding explicit reasoning steps to your prompts can significantly improve agent performance:
Let's update our agent prompt to include a reasoning step:
!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/06-agent.baml && cat baml_src/agent.baml
Now let's test it with a simple calculation to see the reasoning in action:
main("can you multiply 3 and 4")
The model uses explicit reasoning steps to think through the problem before making a decision.
You can enhance your prompts further by:
The key is to guide the model's thinking process while still allowing flexibility.
In this section, we'll explore how to customize the context window of the agent.
This is core to factor 3 - own your context window
How you format your conversation history can significantly impact:
Let's implement two serialization formats: pretty-printed JSON and XML.
# ./walkthrough/07-agent.py
# Agent with configurable serialization formats
import json
class Thread:
"""Thread that can serialize to different formats."""
def __init__(self, events):
self.events = events
def serialize_as_json(self):
"""Serialize thread events to pretty-printed JSON."""
return json.dumps(self.events, indent=2)
def serialize_as_xml(self):
"""Serialize thread events to XML format for better token efficiency."""
import yaml
xml_parts = ["<thread>"]
for event in self.events:
event_type = event['type']
event_data = event['data']
if event_type == 'user_input':
xml_parts.append(f' <user_input>{event_data}</user_input>')
elif event_type == 'tool_call':
# Use YAML for tool call args - more compact than nested XML
yaml_content = yaml.dump(event_data, default_flow_style=False).strip()
xml_parts.append(f' <{event_data["tool"]}>')
xml_parts.append(' ' + '\n '.join(yaml_content.split('\n')))
xml_parts.append(f' </{event_data["tool"]}>')
elif event_type == 'clarification_request':
xml_parts.append(f' <clarification_request>{event_data}</clarification_request>')
elif event_type == 'clarification_response':
xml_parts.append(f' <clarification_response>{event_data}</clarification_response>')
xml_parts.append("</thread>")
return "\n".join(xml_parts)
def agent_loop(thread, clarification_handler, use_xml=True):
"""Run the agent loop with configurable serialization."""
while True:
# Get the client
baml_client = get_baml_client()
# Serialize the thread based on format preference
if use_xml:
thread_str = thread.serialize_as_xml()
print(f"š Using XML serialization ({len(thread_str)} chars)")
else:
thread_str = thread.serialize_as_json()
print(f"š Using JSON serialization ({len(thread_str)} chars)")
# Call the agent
result = baml_client.DetermineNextStep(thread_str)
# Check what type of result we got based on intent
if hasattr(result, 'intent'):
if result.intent == 'done_for_now':
return result.message
elif result.intent == 'request_more_information':
# Get clarification from the human
clarification = clarification_handler(result.message)
# Add the clarification to the thread
thread.events.append({
"type": "clarification_request",
"data": result.message
})
thread.events.append({
"type": "clarification_response",
"data": clarification
})
# Continue the loop with the clarification
elif result.intent in ['add', 'subtract', 'multiply', 'divide']:
# Execute the appropriate tool based on intent
if result.intent == 'add':
result_value = result.a + result.b
operation = f"add({result.a}, {result.b})"
elif result.intent == 'subtract':
result_value = result.a - result.b
operation = f"subtract({result.a}, {result.b})"
elif result.intent == 'multiply':
result_value = result.a * result.b
operation = f"multiply({result.a}, {result.b})"
elif result.intent == 'divide':
if result.b == 0:
result_value = "Error: Division by zero"
else:
result_value = result.a / result.b
operation = f"divide({result.a}, {result.b})"
print(f"š§ Calling tool: {operation} = {result_value}")
# Add the tool call and result to the thread
thread.events.append({
"type": "tool_call",
"data": {
"tool": "calculator",
"operation": operation,
"result": result_value
}
})
else:
return "Error: Unexpected result type"
Now let's create a main function that can switch between formats:
# ./walkthrough/07-main.py
def main(message="hello from the notebook!", use_xml=True):
# Function to handle clarification requests
def handle_clarification(question):
return get_human_input(f"The agent needs clarification: {question}")
# Create a new thread with the user's message
thread = Thread([{"type": "user_input", "data": message}])
print(f"š Starting agent with message: '{message}'")
print(f"š Using {'XML' if use_xml else 'JSON'} format for thread serialization")
# Run the agent loop with XML serialization
result = agent_loop(thread, handle_clarification, use_xml=use_xml)
# Print the final response
print(f"\nā
Final response: {result}")
Let's test with JSON format first:
main("can you multiply 3 and 4, then divide the result by 2", use_xml=False)
Now let's try the same with XML format:
main("can you multiply 3 and 4, then divide the result by 2", use_xml=True)
XML Benefits:
JSON Benefits:
Choose based on your specific needs and token constraints!