ruflo/src/ruvocal/docs/source/configuration/llm-router.md
Chat UI includes an intelligent routing system that automatically selects the best model for each request. When enabled, users see a virtual "Omni" model that routes to specialized models based on the conversation context.
The router uses katanemo/Arch-Router-1.5B for route selection.
# Arch router endpoint (OpenAI-compatible)
LLM_ROUTER_ARCH_BASE_URL=https://router.huggingface.co/v1
LLM_ROUTER_ARCH_MODEL=katanemo/Arch-Router-1.5B
# Path to your routes policy JSON
LLM_ROUTER_ROUTES_PATH=./config/routes.json
Create a JSON file defining your routes. Each route specifies:
[
{
"name": "coding",
"description": "Programming, debugging, code review",
"primary_model": "Qwen/Qwen3-Coder-480B-A35B-Instruct",
"fallback_models": ["meta-llama/Llama-3.3-70B-Instruct"]
},
{
"name": "casual_conversation",
"description": "General chat, questions, explanations",
"primary_model": "meta-llama/Llama-3.3-70B-Instruct"
}
]
# Route to use when Arch returns "other"
LLM_ROUTER_OTHER_ROUTE=casual_conversation
# Model to use if Arch selection fails entirely
LLM_ROUTER_FALLBACK_MODEL=meta-llama/Llama-3.3-70B-Instruct
# Selection timeout (milliseconds)
LLM_ROUTER_ARCH_TIMEOUT_MS=10000
When a user sends an image, the router can bypass Arch and route directly to a vision model:
LLM_ROUTER_ENABLE_MULTIMODAL=true
LLM_ROUTER_MULTIMODAL_MODEL=meta-llama/Llama-3.2-90B-Vision-Instruct
When a user has MCP servers enabled, the router can automatically select a tools-capable model:
LLM_ROUTER_ENABLE_TOOLS=true
LLM_ROUTER_TOOLS_MODEL=meta-llama/Llama-3.3-70B-Instruct
Customize how the router appears in the model selector:
PUBLIC_LLM_ROUTER_ALIAS_ID=omni
PUBLIC_LLM_ROUTER_DISPLAY_NAME=Omni
PUBLIC_LLM_ROUTER_LOGO_URL=https://example.com/logo.png
When a user selects Omni:
The route selection is displayed in the UI so users can see which model was chosen.
To optimize router performance, message content is trimmed before sending to Arch:
# Max characters for assistant messages (default: 500)
LLM_ROUTER_MAX_ASSISTANT_LENGTH=500
# Max characters for previous user messages (default: 400)
LLM_ROUTER_MAX_PREV_USER_LENGTH=400
The latest user message is never trimmed.