fern/03-reference/baml/clients/timeouts.mdx
Configure timeouts on any BAML client to prevent requests from hanging indefinitely.
Timeouts can be configured on leaf clients (OpenAI, Anthropic, etc.).
All timeout values are specified in milliseconds as positive integers.
<ParamField path="connect_timeout_ms" type="int"> Maximum time to establish a network connection to the provider.Default: No timeout (infinite)
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
connect_timeout_ms 5000 // 5 seconds
}
}
}
Default: No timeout (infinite)
Particularly useful for detecting when a provider accepts the request but takes too long to start generating.
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
time_to_first_token_timeout_ms 10000 // 10 seconds
}
}
}
Default: No timeout (infinite)
Important for detecting stalled streaming connections.
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
idle_timeout_ms 15000 // 15 seconds
}
}
}
Default: No timeout (infinite)
For streaming responses, this applies to the entire stream duration (first token to last token).
client<llm> MyClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 60000 // 60 seconds
}
}
}
When composite clients reference subclients with their own timeouts, the minimum (most restrictive) timeout wins.
client<llm> FastClient {
provider openai
options {
model "gpt-3.5-turbo"
api_key env.OPENAI_API_KEY
http {
connect_timeout_ms 3000
request_timeout_ms 20000
}
}
}
client<llm> SlowClient {
provider openai
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 60000
}
}
}
client<llm> MyFallback {
provider fallback
options {
strategy [FastClient, SlowClient]
http {
connect_timeout_ms 5000 // Parent timeout
idle_timeout_ms 15000 // Parent timeout
}
}
}
Effective timeouts:
When calling FastClient:
connect_timeout_ms: min(5000, 3000) = 3000ms (FastClient is stricter)request_timeout_ms: min(∞, 20000) = 20000ms (only FastClient defines it)idle_timeout_ms: min(15000, ∞) = 15000ms (only parent defines it)When calling SlowClient:
connect_timeout_ms: min(5000, ∞) = 5000ms (only parent defines it)request_timeout_ms: min(∞, 60000) = 60000ms (only SlowClient defines it)idle_timeout_ms: min(15000, ∞) = 15000ms (only parent defines it)All timeouts are evaluated concurrently. A request fails when any timeout is exceeded:
connect_timeout_ms appliestime_to_first_token_timeout_ms starts when request is sentrequest_timeout_ms starts when request is sentidle_timeout_ms starts after each chunk is receivedWhen a client has both timeouts and a retry policy:
Example:
retry_policy Exponential {
max_retries 3
strategy {
type exponential_backoff
}
}
client<llm> MyClient {
provider openai
retry_policy Exponential
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
http {
request_timeout_ms 30000 // Each attempt gets 30 seconds
}
}
}
Maximum possible time: ~30s × 4 attempts + exponential backoff delays
Override timeout values at runtime using the client registry:
<CodeGroup> ```typescript TypeScript import { b } from './baml_client'const result = await b.MyFunction(input, { clientRegistry: b.ClientRegistry.override({ "MyClient": { options: { http: { request_timeout_ms: 10000, idle_timeout_ms: 5000 } } } }) })
```python Python
from baml_client import b
result = await b.MyFunction(
input,
baml_options={
"client_registry": b.ClientRegistry.override({
"MyClient": {
"options": {
"http": {
"request_timeout_ms": 10000,
"idle_timeout_ms": 5000
}
}
}
})
}
)
result = b.my_function(
input,
baml_options: {
client_registry: b.ClientRegistry.override({
"MyClient" => {
options: {
http: {
request_timeout_ms: 10000,
idle_timeout_ms: 5000
}
}
}
})
}
)
Runtime overrides follow the same composition rules: the minimum timeout wins when composing runtime values with config file values.
Timeout errors are represented by BamlTimeoutError, a subclass of BamlClientError:
BamlError
└── BamlClientError
└── BamlTimeoutError
Timeout errors include structured fields:
client: The client name that timed outtimeout_type: The specific timeout that was exceededconfigured_value_ms: The configured timeout value in millisecondselapsed_ms: The actual elapsed time in millisecondsmessage: A human-readable error messagetry: result = await b.MyFunction(input) except BamlTimeoutError as e: print(f"Timeout: {e.timeout_type}") print(f"Configured: {e.configured_value_ms}ms") print(f"Elapsed: {e.elapsed_ms}ms")
```typescript TypeScript
import { BamlTimeoutError } from '@boundaryml/baml'
try {
const result = await b.MyFunction(input)
} catch (e) {
if (e instanceof BamlTimeoutError) {
console.log(`Timeout: ${e.timeout_type}`)
console.log(`Configured: ${e.configured_value_ms}ms`)
console.log(`Elapsed: ${e.elapsed_ms}ms`)
}
}
begin
result = b.my_function(input)
rescue Baml::TimeoutError => e
puts "Timeout: #{e.timeout_type}"
puts "Configured: #{e.configured_value_ms}ms"
puts "Elapsed: #{e.elapsed_ms}ms"
end
BAML validates timeout configurations at compile time:
request_timeout_ms must be ≥ time_to_first_token_timeout_ms (if both are specified)Invalid configurations will cause BAML to raise validation errors with helpful messages.