fern/01-guide/04-baml-basics/timeouts.mdx
Timeouts help you build resilient applications by preventing requests from hanging indefinitely. BAML provides granular timeout controls at multiple stages of the request lifecycle.
Without timeouts, your application can stall when:
Timeouts let you fail fast and either retry or fallback to alternative clients.
Add timeouts to any client by specifying timeout values in the http block within options:
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
// Set timeouts (all values in milliseconds)
http {
connect_timeout_ms 5000 // 5 seconds to connect
request_timeout_ms 30000 // 30 seconds total
}
}
}
BAML supports four types of timeouts for individual requests, plus a fifth timeout type for composite clients (fallback, round-robin):
connect_timeout_msMaximum time to establish a connection to the LLM provider.
When to use: Detect unreachable endpoints quickly.
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
connect_timeout_ms 3000 // Fail if can't connect within 3s
}
}
}
time_to_first_token_timeout_msMaximum time to receive the first token after sending the request.
When to use: Detect when the provider accepts your request but takes too long to start generating.
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
time_to_first_token_timeout_ms 10000 // First token within 10s
}
}
}
idle_timeout_msMaximum time between receiving data chunks during streaming.
When to use: Detect stalled connections where the provider stops sending data mid-response.
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
idle_timeout_ms 15000 // No more than 15s between chunks
}
}
}
request_timeout_msMaximum total time for the entire request-response cycle.
When to use: Ensure requests complete within your application's latency requirements.
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 60000 // Complete within 60s total
}
}
}
Each retry attempt gets the full timeout duration:
retry_policy Aggressive {
max_retries 3
strategy {
type exponential_backoff
}
}
client<llm> MyClient {
provider openai
retry_policy Aggressive
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 30000 // 30s per attempt, including retries
}
}
}
If the first attempt times out at 30 seconds, the retry mechanism kicks in and the next attempt gets a fresh 30-second timeout.
Total time: Up to 4 attempts × 30s + retry delays = ~2+ minutes
Override timeouts at runtime using the Client Registry:
Timeout errors are a subclass of BamlClientError called BamlTimeoutError. You can catch them specifically:
try: result = await b.ExtractData(input) except BamlTimeoutError as e: # Handle timeout specifically print(f"Request timed out: {e.message}") print(f"Timeout type: {e.timeout_type}") print(f"Configured: {e.configured_value_ms}ms, Elapsed: {e.elapsed_ms}ms") except BamlClientError as e: # Handle other client errors print(f"Client error: {e.message}")
```typescript TypeScript
import { b } from './baml_client'
import { BamlTimeoutError } from '@boundaryml/baml'
try {
const result = await b.ExtractData(input)
} catch (e) {
if (e instanceof BamlTimeoutError) {
// Handle timeout specifically
console.log(`Request timed out: ${e.message}`)
console.log(`Timeout type: ${e.timeout_type}`)
console.log(`Configured: ${e.configured_value_ms}ms, Elapsed: ${e.elapsed_ms}ms`)
} else {
// Handle other errors
console.log(`Error: ${e}`)
}
}
begin
result = b.extract_data(input)
rescue Baml::TimeoutError => e
# Handle timeout specifically
puts "Request timed out: #{e.message}"
puts "Timeout type: #{e.timeout_type}"
puts "Configured: #{e.configured_value_ms}ms, Elapsed: #{e.elapsed_ms}ms"
rescue Baml::ClientError => e
# Handle other client errors
puts "Client error: #{e.message}"
end
For more on error handling, see Error Handling.
For most production applications, we recommend starting with:
client<llm> ProductionClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
connect_timeout_ms 10000 // 10s to connect
time_to_first_token_timeout_ms 30000 // 30s to first token
idle_timeout_ms 2000 // 2s between chunks
request_timeout_ms 300000 // 5 minutes total
}
}
}
For fallback clients with stricter requirements:
client<llm> FallbackClient {
provider fallback
options {
strategy [Primary, Secondary, Tertiary]
http {
connect_timeout_ms 5000 // Faster failover
time_to_first_token_timeout_ms 15000
idle_timeout_ms 2000
request_timeout_ms 120000 // 2 min per attempt
}
}
}
Begin with generous timeouts and monitor your application's performance. Tighten timeouts gradually based on real-world data.
Faster models can use stricter timeouts:
client<llm> FastTurbo {
provider openai
options {
model "gpt-3.5-turbo"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 15000 // Turbo is fast
}
}
}
client<llm> SlowButSmart {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 60000 // GPT-4 needs more time
}
}
}
Track how often timeouts occur using BAML Studio or your own observability tools. High timeout rates indicate you should either:
Timeouts and abort controllers serve different purposes:
Use timeouts for resilience and SLAs. Use abort controllers when users explicitly cancel operations.
You can use both together:
const controller = new AbortController()
// User clicks "cancel" button
button.onclick = () => controller.abort()
try {
const result = await b.ExtractData(input, {
abortController: controller
// Client still has its configured timeouts
})
} catch (e) {
if (e instanceof BamlAbortError) {
console.log('User cancelled')
} else if (e instanceof BamlTimeoutError) {
console.log('Request timed out')
}
}