Back to Lightrag

Role-Specific LLM/VLM Configuration Guide

docs/RoleSpecificLLMConfiguration.md

1.5.016.9 KB
Original Source

Role-Specific LLM/VLM Configuration Guide

LightRAG supports configuring different LLMs or VLMs for different processing stages. This mechanism is useful when using a lower-cost model for extraction, a stronger model for final answers, or a dedicated vision-language model for multimodal analysis.

Role Overview

Four roles are currently supported:

RolePurpose
EXTRACTEntity/relation extraction and entity/relation description summarization.
KEYWORDQuery keyword extraction for high-level / low-level keyword generation before retrieval.
QUERYFinal QA, regular queries, bypass queries, and the query path of the Ollama-compatible API.
VLMMultimodal analysis stage for VLM analysis of images, tables, formulas, and similar content.

If a role has no dedicated configuration, LightRAG uses the base LLM_* configuration.

Base LLM Configuration

The base configuration defines the default LLM provider, model, service endpoint, authentication information, and concurrency control:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key

# Default timeout for all LLM requests
LLM_TIMEOUT=180

# Default maximum concurrency for all LLM calls
MAX_ASYNC=4

Common fields:

VariableDescription
LLM_BINDINGBase LLM provider. Supported values are openai, ollama, lollms, azure_openai, bedrock, and gemini.
LLM_MODELBase model name. For Azure OpenAI, this is usually the deployment name.
LLM_BINDING_HOSTBase provider endpoint. For SDK default endpoints, use the corresponding sentinel, such as DEFAULT_GEMINI_ENDPOINT or DEFAULT_BEDROCK_ENDPOINT.
LLM_BINDING_API_KEYBase API key. Bedrock does not use this field.
LLM_TIMEOUTBase LLM timeout. A role inherits it when no role timeout is set.
MAX_ASYNCBase maximum LLM concurrency. A role inherits it when {ROLE}_MAX_ASYNC_LLM is not set.

Role Override Variables

Each role can override the binding, model, endpoint, API key, concurrency, and timeout:

env
QUERY_LLM_BINDING=openai
QUERY_LLM_MODEL=gpt-5
QUERY_LLM_BINDING_HOST=https://api.openai.com/v1
QUERY_LLM_BINDING_API_KEY=your_query_api_key
QUERY_MAX_ASYNC_LLM=2
QUERY_LLM_TIMEOUT=240

Variable format:

VariableDescription
{ROLE}_LLM_BINDINGOverrides the role provider. ROLE can be EXTRACT, KEYWORD, QUERY, or VLM.
{ROLE}_LLM_MODELOverrides the role model name.
{ROLE}_LLM_BINDING_HOSTOverrides the role endpoint.
{ROLE}_LLM_BINDING_API_KEYOverrides the role API key. Bedrock does not support it.
{ROLE}_MAX_ASYNC_LLMOverrides the role maximum concurrency. Inherits MAX_ASYNC when unset.
{ROLE}_LLM_TIMEOUTOverrides the role timeout. Inherits LLM_TIMEOUT when unset.

Provider Option Overrides

Provider-specific options use the following format:

env
{ROLE}_{PROVIDER_PREFIX}_{FIELD}

Examples:

env
# Override only the OpenAI reasoning effort for the QUERY role
QUERY_OPENAI_LLM_REASONING_EFFORT=medium

# Override only Bedrock generation parameters for the EXTRACT role
EXTRACT_BEDROCK_LLM_TEMPERATURE=0.0
EXTRACT_BEDROCK_LLM_MAX_TOKENS=2048

# Override only Gemini generation parameters for the VLM role
VLM_GEMINI_LLM_MAX_OUTPUT_TOKENS=4096
VLM_GEMINI_LLM_TEMPERATURE=0.2

Common provider prefixes:

ProviderBase option prefixRole option example
openai / azure_openaiOPENAI_LLM_*QUERY_OPENAI_LLM_REASONING_EFFORT
ollamaOLLAMA_LLM_*EXTRACT_OLLAMA_LLM_NUM_PREDICT
lollmsUses the Ollama-compatible option setQUERY_OLLAMA_LLM_TEMPERATURE
bedrockBEDROCK_LLM_*EXTRACT_BEDROCK_LLM_MAX_TOKENS
geminiGEMINI_LLM_*VLM_GEMINI_LLM_THINKING_CONFIG

Inheritance Rules

Overrides Within the Same Provider

If a role does not set {ROLE}_LLM_BINDING, or sets it to the same value as the base LLM_BINDING, the role inherits the base configuration:

  • Inherits LLM_MODEL when {ROLE}_LLM_MODEL is not set.
  • Inherits LLM_BINDING_HOST when {ROLE}_LLM_BINDING_HOST is not set.
  • Inherits LLM_BINDING_API_KEY when {ROLE}_LLM_BINDING_API_KEY is not set.
  • Inherits LLM_TIMEOUT when {ROLE}_LLM_TIMEOUT is not set.
  • Inherits MAX_ASYNC when {ROLE}_MAX_ASYNC_LLM is not set.
  • Provider options first inherit the base provider options, then apply role-specific provider options.

Therefore, when you only want to change the model within the same provider, you only need to set the model name:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key
OPENAI_LLM_REASONING_EFFORT=minimal

# QUERY inherits host, API key, timeout, concurrency, and OPENAI_LLM_REASONING_EFFORT
QUERY_LLM_MODEL=gpt-5

Cross-Provider Overrides

If a role's {ROLE}_LLM_BINDING differs from the base LLM_BINDING, it is a cross-provider configuration. The current rules are:

  • {ROLE}_LLM_MODEL must be set.
  • Non-Bedrock providers must set {ROLE}_LLM_BINDING_API_KEY.
  • If {ROLE}_LLM_BINDING_HOST is not set, LightRAG tries to use that provider's default host.
  • Provider options do not inherit base provider options. They start empty and only apply role-specific provider options.

Example: use Ollama as the base for local extraction, then use OpenAI for final answers:

env
LLM_BINDING=ollama
LLM_MODEL=qwen3.5:9b
LLM_BINDING_HOST=http://localhost:11434
OLLAMA_LLM_NUM_CTX=32768

QUERY_LLM_BINDING=openai
QUERY_LLM_MODEL=gpt-5-mini
QUERY_LLM_BINDING_HOST=https://api.openai.com/v1
QUERY_LLM_BINDING_API_KEY=your_openai_api_key
QUERY_OPENAI_LLM_REASONING_EFFORT=minimal

For cross-provider configurations, explicitly setting {ROLE}_LLM_BINDING_HOST is recommended to avoid confusion between the default host and the base provider endpoint.

Bedrock Authentication Rules

Bedrock does not use LLM_BINDING_API_KEY and does not support {ROLE}_LLM_BINDING_API_KEY. Available authentication methods are:

  • Global SigV4: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, and AWS_REGION.
  • Role-level SigV4: {ROLE}_AWS_ACCESS_KEY_ID, {ROLE}_AWS_SECRET_ACCESS_KEY, {ROLE}_AWS_SESSION_TOKEN, and {ROLE}_AWS_REGION.
  • Process-level bearer token: AWS_BEARER_TOKEN_BEDROCK. This is an AWS SDK process-level setting and cannot be overridden per role.

Role-level Bedrock example:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_openai_api_key

EXTRACT_LLM_BINDING=bedrock
EXTRACT_LLM_MODEL=us.amazon.nova-lite-v1:0
EXTRACT_LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT
EXTRACT_AWS_REGION=us-west-2
EXTRACT_AWS_ACCESS_KEY_ID=your_extract_access_key
EXTRACT_AWS_SECRET_ACCESS_KEY=your_extract_secret_key
EXTRACT_AWS_SESSION_TOKEN=your_optional_session_token
EXTRACT_BEDROCK_LLM_TEMPERATURE=0.0
EXTRACT_BEDROCK_LLM_MAX_TOKENS=2048

Provider Behavior Matrix

ProviderRole-level host/base_urlRole-level API keyAuthentication limitations
openaiSupported, passed to the OpenAI-compatible client through {ROLE}_LLM_BINDING_HOST.Supports {ROLE}_LLM_BINDING_API_KEY; when unset within the same provider, it inherits the base LLM_BINDING_API_KEY.Currently mainly API key / Bearer mode.
ollamaSupported, passed to the Ollama client through {ROLE}_LLM_BINDING_HOST.Supports {ROLE}_LLM_BINDING_API_KEY; when unset within the same provider, it inherits the base key. If no key reaches the lower layer, it falls back to OLLAMA_API_KEY.Bearer header.
lollmsSupported, using {ROLE}_LLM_BINDING_HOST as base_url.Supports {ROLE}_LLM_BINDING_API_KEY; when unset within the same provider, it inherits the base key.Bearer header.
azure_openaiSupported, using {ROLE}_LLM_BINDING_HOST as the Azure endpoint.Supports {ROLE}_LLM_BINDING_API_KEY; when unset within the same provider, it inherits the base key and may also fall back to AZURE_OPENAI_API_KEY.AZURE_OPENAI_API_VERSION is a global environment variable and does not support role-level overrides.
bedrockSupported, using {ROLE}_LLM_BINDING_HOST as endpoint_url; DEFAULT_BEDROCK_ENDPOINT means letting the AWS SDK choose.Generic API keys are not supported.Uses global or role-level SigV4. AWS_BEARER_TOKEN_BEDROCK is process-level and cannot be overridden per role.
geminiSupported, passed to the Google GenAI client through {ROLE}_LLM_BINDING_HOST; DEFAULT_GEMINI_ENDPOINT means using the SDK default endpoint.AI Studio mode supports {ROLE}_LLM_BINDING_API_KEY.Vertex AI is controlled by GOOGLE_GENAI_USE_VERTEXAI, GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION, and GOOGLE_APPLICATION_CREDENTIALS; all are process-level settings.

1. Same Provider, Only Change the Model

Suitable when using the same OpenAI key and endpoint, but using a stronger model for final answers:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key
OPENAI_LLM_REASONING_EFFORT=minimal

QUERY_LLM_MODEL=gpt-5
QUERY_MAX_ASYNC_LLM=2

QUERY inherits the base host, API key, and OPENAI_LLM_REASONING_EFFORT.

2. Same Provider, Change the Model and Tune Options

Suitable when the base model is used for extraction and final answers use a higher reasoning effort:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key
OPENAI_LLM_REASONING_EFFORT=minimal
OPENAI_LLM_MAX_COMPLETION_TOKENS=4096

QUERY_LLM_MODEL=gpt-5
QUERY_OPENAI_LLM_REASONING_EFFORT=medium
QUERY_OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
QUERY_LLM_TIMEOUT=240

3. Same Provider with Different Endpoints and API Keys

Suitable when all roles use the openai binding, but some roles access the official OpenAI API while others access a local vLLM, SGLang, OpenRouter, or another OpenAI-compatible endpoint. In the example below:

  • EXTRACT uses the official OpenAI gpt-5-mini.
  • QUERY uses the official OpenAI gpt-5.4 with a separate OpenAI key.
  • KEYWORD uses Qwen3.5-35B-A3B deployed by local vLLM.
env
###########################################################################
# Base LLM fallback. Keep it aligned with EXTRACT so unspecified roles still
# have a valid OpenAI configuration.
###########################################################################
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_extract_openai_api_key
LLM_TIMEOUT=180
MAX_ASYNC=4

###########################################################################
# IMPORTANT:
# Do not set global OPENAI_LLM_REASONING_EFFORT here if any same-provider role
# points to a local OpenAI-compatible server that does not support it.
# Use role-specific OPENAI options instead.
###########################################################################
# OPENAI_LLM_REASONING_EFFORT=none

###########################################################################
# EXTRACT: OpenAI official API, gpt-5-mini
###########################################################################
EXTRACT_LLM_BINDING=openai
EXTRACT_LLM_MODEL=gpt-5-mini
EXTRACT_LLM_BINDING_HOST=https://api.openai.com/v1
EXTRACT_LLM_BINDING_API_KEY=your_extract_openai_api_key
EXTRACT_OPENAI_LLM_REASONING_EFFORT=low
EXTRACT_OPENAI_LLM_MAX_COMPLETION_TOKENS=4096
EXTRACT_MAX_ASYNC_LLM=4
EXTRACT_LLM_TIMEOUT=180

###########################################################################
# QUERY: OpenAI official API, gpt-5.4, separate API key
###########################################################################
QUERY_LLM_BINDING=openai
QUERY_LLM_MODEL=gpt-5.4
QUERY_LLM_BINDING_HOST=https://api.openai.com/v1
QUERY_LLM_BINDING_API_KEY=your_query_openai_api_key
QUERY_OPENAI_LLM_REASONING_EFFORT=medium
QUERY_OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
QUERY_MAX_ASYNC_LLM=2
QUERY_LLM_TIMEOUT=240

###########################################################################
# KEYWORD: local vLLM OpenAI-compatible endpoint, Qwen3.5-35B-A3B
###########################################################################
KEYWORD_LLM_BINDING=openai
KEYWORD_LLM_MODEL=Qwen3.5-35B-A3B
KEYWORD_LLM_BINDING_HOST=http://localhost:8000/v1
# If vLLM was started with --api-key, use the same value here.
# If vLLM has no auth, still set a non-empty dummy value to avoid falling
# back to the official OpenAI key.
KEYWORD_LLM_BINDING_API_KEY=local-vllm-api-key
KEYWORD_OPENAI_LLM_MAX_TOKENS=2048
# Optional for Qwen-style models served by vLLM when you want to disable thinking.
KEYWORD_OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}'
KEYWORD_MAX_ASYNC_LLM=4
KEYWORD_LLM_TIMEOUT=180

This pattern is not cross-provider because all three roles use the openai binding. LightRAG passes each role's *_LLM_BINDING_HOST and *_LLM_BINDING_API_KEY to the OpenAI-compatible client separately.

Note: provider options within the same provider inherit the base OPENAI_LLM_*. If the local vLLM server does not support official OpenAI parameters such as reasoning_effort, do not set the global OPENAI_LLM_REASONING_EFFORT; use role-level variables such as EXTRACT_OPENAI_LLM_REASONING_EFFORT and QUERY_OPENAI_LLM_REASONING_EFFORT instead.

4. One Role Crosses Provider

Suitable when the base uses an official OpenAI model and only keyword extraction uses local Ollama:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_openai_api_key
OPENAI_LLM_REASONING_EFFORT=medium

KEYWORD_LLM_BINDING=ollama
KEYWORD_LLM_MODEL=qwen3.5:9b
KEYWORD_LLM_BINDING_HOST=http://localhost:11434
KEYWORD_LLM_BINDING_API_KEY=ollama-local-key
KEYWORD_OLLAMA_LLM_NUM_CTX=32768

For cross-provider configurations, Ollama options do not inherit OpenAI options. For local Ollama, KEYWORD_LLM_BINDING_API_KEY can usually use a placeholder value; the current cross-provider validation requires non-Bedrock roles to explicitly provide a role-level API key.

5. Specify a Dedicated Multimodal Model for VLM

Suitable when text tasks use a cheaper model and multimodal analysis uses a vision-language model:

env
VLM_PROCESS_ENABLE=true

LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key

VLM_LLM_BINDING=openai
VLM_LLM_MODEL=gpt-4o
VLM_OPENAI_LLM_MAX_TOKENS=4096
VLM_MAX_ASYNC_LLM=2
VLM_LLM_TIMEOUT=240

If VLM uses the same provider and key, VLM_LLM_BINDING_HOST and VLM_LLM_BINDING_API_KEY can be omitted.

VLM_PROCESS_ENABLE is the master switch for multimodal analysis. When false, the pipeline emits a warning and skips every multimodal item without invoking the VLM. When true, the effective VLM binding (VLM_LLM_BINDING if set, otherwise LLM_BINDING) must support image inputs. The following providers are vision-capable: openai, azure_openai, gemini, bedrock, ollama, anthropic. lollms is rejected at startup because it cannot accept image inputs.

6. Bedrock Role-Level SigV4 Credentials

Suitable when only one role accesses Bedrock and uses independent IAM/STS credentials:

env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_openai_api_key

QUERY_LLM_BINDING=bedrock
QUERY_LLM_MODEL=us.amazon.nova-lite-v1:0
QUERY_LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT
QUERY_AWS_REGION=us-east-1
QUERY_AWS_ACCESS_KEY_ID=your_query_access_key
QUERY_AWS_SECRET_ACCESS_KEY=your_query_secret_key
QUERY_AWS_SESSION_TOKEN=your_optional_session_token
QUERY_BEDROCK_LLM_MAX_TOKENS=4096
QUERY_BEDROCK_LLM_TEMPERATURE=0.2

Do not set QUERY_LLM_BINDING_API_KEY; Bedrock rejects that configuration.

Caveats

  • Within the same provider, provider options such as OPENAI_LLM_REASONING_EFFORT, OPENAI_LLM_MAX_TOKENS, OLLAMA_LLM_NUM_CTX, and GEMINI_LLM_THINKING_CONFIG are inherited automatically.
  • There is currently no clean role-level semantic for "unsetting an inherited provider option". If a model in a same-provider role does not support a base option, explicitly override that option for the role with a supported value, or configure the role as cross-provider and set only the role-specific provider options it supports.
  • AZURE_OPENAI_DEPLOYMENT and AZURE_OPENAI_API_VERSION for azure_openai are global environment variables. If AZURE_OPENAI_DEPLOYMENT is set, it may take precedence over the role model name.
  • Gemini Vertex AI mode is controlled by process-level Google environment variables. In the same LightRAG process, some roles cannot use Vertex AI while others use AI Studio API keys.
  • In Docker/Compose, LLM_BINDING_HOST usually needs to use a container-reachable address such as host.docker.internal; role-level hosts follow the same principle.
  • Restart LightRAG Server after modifying .env. Some IDE terminals preload .env, so opening a new terminal session is recommended to confirm that environment variables take effect.