Back to Baml

Configuring Timeouts

fern/01-guide/04-baml-basics/timeouts.mdx

0.222.07.7 KB
Original Source

Timeouts help you build resilient applications by preventing requests from hanging indefinitely. BAML provides granular timeout controls at multiple stages of the request lifecycle.

Why Use Timeouts?

Without timeouts, your application can stall when:

  • LLM provider endpoints are unreachable
  • Providers accept requests but take too long to respond
  • Network connections stall mid-stream
  • Long-running requests exceed your application's latency requirements

Timeouts let you fail fast and either retry or fallback to alternative clients.

Quick Start

Add timeouts to any client by specifying timeout values in the http block within options:

baml
client<llm> MyClient {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY

    // Set timeouts (all values in milliseconds)
    http {
      connect_timeout_ms 5000      // 5 seconds to connect
      request_timeout_ms 30000     // 30 seconds total
    }
  }
}

Available Timeout Types

BAML supports four types of timeouts for individual requests, plus a fifth timeout type for composite clients (fallback, round-robin):

connect_timeout_ms

Maximum time to establish a connection to the LLM provider.

When to use: Detect unreachable endpoints quickly.

baml
client<llm> MyClient {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    http {
      connect_timeout_ms 3000  // Fail if can't connect within 3s
    }
  }
}

time_to_first_token_timeout_ms

Maximum time to receive the first token after sending the request.

When to use: Detect when the provider accepts your request but takes too long to start generating.

baml
client<llm> MyClient {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    http {
      time_to_first_token_timeout_ms 10000  // First token within 10s
    }
  }
}
<Tip> This timeout is especially useful for streaming responses where you want to ensure the LLM starts responding quickly, even if the full response takes longer. </Tip>

idle_timeout_ms

Maximum time between receiving data chunks during streaming.

When to use: Detect stalled connections where the provider stops sending data mid-response.

baml
client<llm> MyClient {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    http {
      idle_timeout_ms 15000  // No more than 15s between chunks
    }
  }
}

request_timeout_ms

Maximum total time for the entire request-response cycle.

When to use: Ensure requests complete within your application's latency requirements.

baml
client<llm> MyClient {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    http {
      request_timeout_ms 60000  // Complete within 60s total
    }
  }
}

Timeouts with Retry Policies

Each retry attempt gets the full timeout duration:

baml
retry_policy Aggressive {
  max_retries 3
  strategy {
    type exponential_backoff
  }
}

client<llm> MyClient {
  provider openai
  retry_policy Aggressive
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    http {
      request_timeout_ms 30000  // 30s per attempt, including retries
    }
  }
}

If the first attempt times out at 30 seconds, the retry mechanism kicks in and the next attempt gets a fresh 30-second timeout.

Total time: Up to 4 attempts × 30s + retry delays = ~2+ minutes

Runtime Timeout Overrides

Override timeouts at runtime using the Client Registry:

Handling Timeout Errors

Timeout errors are a subclass of BamlClientError called BamlTimeoutError. You can catch them specifically:

<CodeGroup> ```python Python from baml_client import b from baml_py.errors import BamlTimeoutError, BamlClientError

try: result = await b.ExtractData(input) except BamlTimeoutError as e: # Handle timeout specifically print(f"Request timed out: {e.message}") print(f"Timeout type: {e.timeout_type}") print(f"Configured: {e.configured_value_ms}ms, Elapsed: {e.elapsed_ms}ms") except BamlClientError as e: # Handle other client errors print(f"Client error: {e.message}")


```typescript TypeScript
import { b } from './baml_client'
import { BamlTimeoutError } from '@boundaryml/baml'

try {
  const result = await b.ExtractData(input)
} catch (e) {
  if (e instanceof BamlTimeoutError) {
    // Handle timeout specifically
    console.log(`Request timed out: ${e.message}`)
    console.log(`Timeout type: ${e.timeout_type}`)
    console.log(`Configured: ${e.configured_value_ms}ms, Elapsed: ${e.elapsed_ms}ms`)
  } else {
    // Handle other errors
    console.log(`Error: ${e}`)
  }
}
ruby
begin
  result = b.extract_data(input)
rescue Baml::TimeoutError => e
  # Handle timeout specifically
  puts "Request timed out: #{e.message}"
  puts "Timeout type: #{e.timeout_type}"
  puts "Configured: #{e.configured_value_ms}ms, Elapsed: #{e.elapsed_ms}ms"
rescue Baml::ClientError => e
  # Handle other client errors
  puts "Client error: #{e.message}"
end
</CodeGroup>

For more on error handling, see Error Handling.

For most production applications, we recommend starting with:

baml
client<llm> ProductionClient {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY

    http {
      connect_timeout_ms 10000                // 10s to connect
      time_to_first_token_timeout_ms 30000    // 30s to first token
      idle_timeout_ms 2000                    // 2s between chunks
      request_timeout_ms 300000               // 5 minutes total
    }
  }
}

For fallback clients with stricter requirements:

baml
client<llm> FallbackClient {
  provider fallback
  options {
    strategy [Primary, Secondary, Tertiary]

    http {
      connect_timeout_ms 5000                 // Faster failover
      time_to_first_token_timeout_ms 15000
      idle_timeout_ms 2000
      request_timeout_ms 120000               // 2 min per attempt
    }
  }
}

Tips and Best Practices

Start Conservative, Then Optimize

Begin with generous timeouts and monitor your application's performance. Tighten timeouts gradually based on real-world data.

Different Timeouts for Different Models

Faster models can use stricter timeouts:

baml
client<llm> FastTurbo {
  provider openai
  options {
    model "gpt-3.5-turbo"
    api_key env.OPENAI_API_KEY
    http {
      request_timeout_ms 15000  // Turbo is fast
    }
  }
}

client<llm> SlowButSmart {
  provider openai
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    http {
      request_timeout_ms 60000  // GPT-4 needs more time
    }
  }
}

Monitor Timeout Rates

Track how often timeouts occur using BAML Studio or your own observability tools. High timeout rates indicate you should either:

  • Increase timeout values
  • Use faster models
  • Optimize your prompts
  • Add more fallback clients

Timeouts vs Abort Controllers

Timeouts and abort controllers serve different purposes:

  • Timeouts: Automatic, configuration-based time limits
  • Abort controllers: Manual, user-initiated cancellation

Use timeouts for resilience and SLAs. Use abort controllers when users explicitly cancel operations.

You can use both together:

typescript
const controller = new AbortController()

// User clicks "cancel" button
button.onclick = () => controller.abort()

try {
  const result = await b.ExtractData(input, {
    abortController: controller
    // Client still has its configured timeouts
  })
} catch (e) {
  if (e instanceof BamlAbortError) {
    console.log('User cancelled')
  } else if (e instanceof BamlTimeoutError) {
    console.log('Request timed out')
  }
}