LLM Router

Chat UI includes an intelligent routing system that automatically selects the best model for each request. When enabled, users see a virtual "Omni" model that routes to specialized models based on the conversation context.

The router uses katanemo/Arch-Router-1.5B for route selection.

Configuration

Basic Setup

ini

# Arch router endpoint (OpenAI-compatible)
LLM_ROUTER_ARCH_BASE_URL=https://router.huggingface.co/v1
LLM_ROUTER_ARCH_MODEL=katanemo/Arch-Router-1.5B

# Path to your routes policy JSON
LLM_ROUTER_ROUTES_PATH=./config/routes.json

Routes Policy

Create a JSON file defining your routes. Each route specifies:

json

[
	{
		"name": "coding",
		"description": "Programming, debugging, code review",
		"primary_model": "Qwen/Qwen3-Coder-480B-A35B-Instruct",
		"fallback_models": ["meta-llama/Llama-3.3-70B-Instruct"]
	},
	{
		"name": "casual_conversation",
		"description": "General chat, questions, explanations",
		"primary_model": "meta-llama/Llama-3.3-70B-Instruct"
	}
]

Fallback Behavior

ini

# Route to use when Arch returns "other"
LLM_ROUTER_OTHER_ROUTE=casual_conversation

# Model to use if Arch selection fails entirely
LLM_ROUTER_FALLBACK_MODEL=meta-llama/Llama-3.3-70B-Instruct

# Selection timeout (milliseconds)
LLM_ROUTER_ARCH_TIMEOUT_MS=10000

Multimodal Routing

When a user sends an image, the router can bypass Arch and route directly to a vision model:

ini

LLM_ROUTER_ENABLE_MULTIMODAL=true
LLM_ROUTER_MULTIMODAL_MODEL=meta-llama/Llama-3.2-90B-Vision-Instruct

Tools Routing

When a user has MCP servers enabled, the router can automatically select a tools-capable model:

ini

LLM_ROUTER_ENABLE_TOOLS=true
LLM_ROUTER_TOOLS_MODEL=meta-llama/Llama-3.3-70B-Instruct

UI Customization

Customize how the router appears in the model selector:

ini

PUBLIC_LLM_ROUTER_ALIAS_ID=omni
PUBLIC_LLM_ROUTER_DISPLAY_NAME=Omni
PUBLIC_LLM_ROUTER_LOGO_URL=https://example.com/logo.png

How It Works

When a user selects Omni:

Chat UI sends the conversation context to the Arch router
Arch analyzes the content and returns a route name
Chat UI maps the route to the corresponding model
The request streams from the selected model
On errors, fallback models are tried in order

The route selection is displayed in the UI so users can see which model was chosen.

Message Length Limits

To optimize router performance, message content is trimmed before sending to Arch:

ini

# Max characters for assistant messages (default: 500)
LLM_ROUTER_MAX_ASSISTANT_LENGTH=500

# Max characters for previous user messages (default: 400)
LLM_ROUTER_MAX_PREV_USER_LENGTH=400

The latest user message is never trimmed.