docs/adapters/nlp/openrouter.md
The OpenRouter service provides access to 400+ AI models through a single unified API, including GPT-4, Claude, Llama, Qwen, and many more. OpenRouter makes it easy to switch between different models for both text generation and embeddings without changing code.
# Set your OpenRouter API key (required)
export OPENROUTER_API_KEY="your-api-key-here"
# Optionally set default models
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_EMBEDDER_MODEL="openai/text-embedding-3-large"
import parlant.sdk as p
from parlant.sdk import NLPServices
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="AI Assistant",
description="A helpful assistant powered by OpenRouter.",
)
# š Ready to use at http://localhost:8800
All configuration is done via environment variables. Set the required and optional environment variables before running your application:
# Required: API Key
export OPENROUTER_API_KEY="your-api-key-here"
# Optional: LLM Configuration
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_MAX_TOKENS="128000"
# Optional: Embedding Configuration
export OPENROUTER_EMBEDDER_MODEL="qwen/qwen3-embedding-8b"
export OPENROUTER_EMBEDDER_DIMENSIONS="4096" # Optional override
# Optional: Analytics
export OPENROUTER_HTTP_REFERER="https://myapp.com"
export OPENROUTER_SITE_NAME="My Application"
| Variable | Description | Example |
|---|---|---|
OPENROUTER_API_KEY | Your OpenRouter API key | sk-or-v1-... |
| Variable | Description | Default | Example |
|---|---|---|---|
OPENROUTER_MODEL | LLM model name | openai/gpt-4o | openai/gpt-4o-mini |
OPENROUTER_MAX_TOKENS | Max tokens limit | Auto-detected | 128000 |
| Variable | Description | Default | Example |
|---|---|---|---|
OPENROUTER_EMBEDDER_MODEL | Embedding model name | openai/text-embedding-3-large | qwen/qwen3-embedding-8b |
OPENROUTER_EMBEDDER_DIMENSIONS | Override embedding dimensions | Auto-detected | 4096 |
| Variable | Description | Example |
|---|---|---|
OPENROUTER_HTTP_REFERER | Your app's URL (for analytics) | https://myapp.com |
OPENROUTER_SITE_NAME | Your app's name (for analytics) | My Application |
OpenRouter supports 400+ models from different providers. Models are automatically optimized with specialized configurations when available.
These models have specialized configurations for optimal performance:
| Model | Provider | Context | Use Case |
|---|---|---|---|
openai/gpt-4o | OpenAI | 128K | Default, best overall quality |
openai/gpt-4o-mini | OpenAI | 128K | Cost-effective, fast |
anthropic/claude-3.5-sonnet | Anthropic | 200K | Advanced reasoning, long context |
meta-llama/llama-3.3-70b-instruct | Meta | 8K | Open-source option |
The service supports multiple embedding models with automatic dimension detection:
| Model | Dimensions | Provider | Use Case |
|---|---|---|---|
openai/text-embedding-3-large | 3072 | OpenAI | Default, high quality |
openai/text-embedding-3-small | 1536 | OpenAI | Faster, smaller |
openai/text-embedding-ada-002 | 1536 | OpenAI | Legacy model |
qwen/qwen3-embedding-8b | 4096 | Qwen | High dimension, multilingual |
qwen/qwen-embedding-v2 | 1536 | Qwen | Multilingual embeddings |
You can use any model that OpenRouter supports by setting the appropriate environment variables:
# LLM Models
export OPENROUTER_MODEL="google/gemini-pro-1.5"
# Embedding Models
export OPENROUTER_EMBEDDER_MODEL="qwen/qwen3-embedding-8b"
Check the OpenRouter Models page for the full list of available models.
Use the default models (GPT-4o for LLM, text-embedding-3-large for embeddings):
import parlant.sdk as p
from parlant.sdk import NLPServices
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="General Assistant",
description="A helpful AI assistant."
)
Use Claude for text generation:
export OPENROUTER_MODEL="anthropic/claude-3.5-sonnet"
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="Claude Assistant",
description="Powered by Claude."
)
Use a custom embedding model for better multilingual support:
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_EMBEDDER_MODEL="qwen/qwen3-embedding-8b"
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="Multilingual Assistant",
description="Supports multiple languages."
)
Optimize for speed and quality:
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_EMBEDDER_MODEL="openai/text-embedding-3-large"
export OPENROUTER_MAX_TOKENS="128000"
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="High-Performance Agent",
description="Optimized for speed and accuracy."
)
Balance quality and cost:
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_EMBEDDER_MODEL="openai/text-embedding-3-small"
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="Cost-Optimized Agent",
description="Optimized for cost-effectiveness."
)
Different embedding models produce vectors of different dimensions. The service automatically detects dimensions for known models, and can auto-detect from API responses for unknown models.
The following models have pre-configured dimensions:
openai/text-embedding-3-large: 3072 dimensionsopenai/text-embedding-3-small: 1536 dimensionsopenai/text-embedding-ada-002: 1536 dimensionsqwen/qwen3-embedding-8b: 4096 dimensionsqwen/qwen-embedding-v2: 1536 dimensionsFor unknown models, dimensions are automatically detected from the first API response and cached for subsequent use.
If needed, you can manually specify dimensions via environment variable:
export OPENROUTER_EMBEDDER_DIMENSIONS="4096"
ā ļø Important: If you change embedder models or dimensions, you may need to clear your vector database cache to avoid dimension mismatch errors.
OpenRouter intelligently handles model selection and configuration:
Known models use specialized generators for optimal performance:
openai/gpt-4o ā OpenRouterGPT4Oopenai/gpt-4o-mini ā OpenRouterGPT4OMinianthropic/claude-3.5-sonnet ā OpenRouterClaude35Sonnetmeta-llama/llama-3.3-70b-instruct ā OpenRouterLlama33_70BEmbedders are automatically configured based on the model name:
Error:
OpenRouter API rate limit exceeded
Solutions:
Error:
Model 'xyz' does not support JSON mode
Solutions:
openai/gpt-4oopenai/gpt-4o-minianthropic/claude-3.5-sonnetError:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Solutions:
parlant-data directoryError:
OPENROUTER_API_KEY is not set
Solutions:
OPENROUTER_API_KEY environment variableError:
Unable to construct dependency of type OpenRouterEmbedder
Solutions:
embedder_model_name is correctly setOpenRouter provides transparent pricing across models. Choose models based on your needs:
# GPT-4o-mini - Good quality, lower cost
model_name="openai/gpt-4o-mini"
# Claude Haiku - Fast, affordable
model_name="anthropic/claude-3-haiku"
# Llama - Open source, very affordable
model_name="meta-llama/llama-3.3-70b-instruct"
# text-embedding-3-small - Smaller, faster, cheaper
embedder_model_name="openai/text-embedding-3-small"
# text-embedding-ada-002 - Legacy, very affordable
embedder_model_name="openai/text-embedding-ada-002"
# GPT-4o - Highest quality
model_name="openai/gpt-4o"
# text-embedding-3-large - Highest quality embeddings
embedder_model_name="openai/text-embedding-3-large"
Check OpenRouter pricing for current rates.
GPT-4o (openai/gpt-4o)
GPT-4o-mini (openai/gpt-4o-mini)
Claude (anthropic/claude-3.5-sonnet)
Llama (meta-llama/llama-3.3-70b-instruct)
text-embedding-3-large (openai/text-embedding-3-large)
text-embedding-3-small (openai/text-embedding-3-small)
qwen3-embedding-8b (qwen/qwen3-embedding-8b)
Begin with the default models (gpt-4o and text-embedding-3-large) for best balance of quality and performance.
Switch to gpt-4o-mini for high-volume operations where cost is a concern.
text-embedding-3-large for quality-critical applicationstext-embedding-3-small for cost-sensitive deploymentsqwen3-embedding-8b for multilingual or high-dimensional needsPrevent runaway costs by setting appropriate max_tokens limits via environment variable:
export OPENROUTER_MAX_TOKENS="128000" # For long-context models
export OPENROUTER_MAX_TOKENS="8192" # For standard use cases
Regularly check the OpenRouter dashboard to monitor usage and costs.
Set OPENROUTER_HTTP_REFERER and OPENROUTER_SITE_NAME to track usage across different applications.
If you switch embedder models, clear your vector database cache to avoid dimension mismatches.
Use environment variables for production deployments instead of hardcoding values:
# Production configuration
export OPENROUTER_API_KEY="sk-or-v1-..."
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_EMBEDDER_MODEL="openai/text-embedding-3-large"
If using an embedding model not in the known list, you can specify dimensions:
export OPENROUTER_EMBEDDER_MODEL="custom/embedding-model"
export OPENROUTER_EMBEDDER_DIMENSIONS="2048"
The service will also auto-detect dimensions from the first API response.
All configuration is done via environment variables. Set multiple variables to configure different aspects:
# Set all configuration via environment variables
export OPENROUTER_MODEL="anthropic/claude-3.5-sonnet"
export OPENROUTER_MAX_TOKENS="200000"
export OPENROUTER_EMBEDDER_MODEL="openai/text-embedding-3-large"
Here's a complete example showing a production-ready setup:
# Set environment variables
export OPENROUTER_API_KEY="your-api-key-here"
export OPENROUTER_MODEL="openai/gpt-4o-mini"
export OPENROUTER_EMBEDDER_MODEL="openai/text-embedding-3-large"
export OPENROUTER_MAX_TOKENS="32768"
import parlant.sdk as p
from parlant.sdk import NLPServices
@p.tool
async def get_weather(context: p.ToolContext, city: str) -> p.ToolResult:
# Your weather API logic here
return p.ToolResult(f"Sunny, 72°F in {city}")
async def main():
async with p.Server(nlp_service=NLPServices.openrouter) as server:
agent = await server.create_agent(
name="Weather Assistant",
description="Helps users check weather conditions."
)
await agent.create_guideline(
condition="User asks about weather",
action="Get weather information using the get_weather tool",
tools=[get_weather]
)
# š Ready at http://localhost:8800
if __name__ == "__main__":
import asyncio
asyncio.run(main())
This setup provides:
gpt-4o-mini)text-embedding-3-large)