Application Settings

Configure watchdog and backend request settings

Watchdog Settings

Configure automatic monitoring and management of backend processes

Enable Watchdog

Enable automatic monitoring of backend processes

Enable Idle Check

Automatically stop backends that are idle for too long

Idle Timeout

Time before an idle backend is stopped (e.g., 15m, 1h)

Enable Busy Check

Automatically stop backends that are busy for too long (stuck processes)

Busy Timeout

Time before a busy backend is stopped (e.g., 5m, 30m)

Check Interval

How often the watchdog checks backends and memory usage (e.g., 2s, 30s)

Force Eviction When Busy

Allow evicting models even when they have active API calls (default: disabled for safety)

LRU Eviction Max Retries

Maximum number of retries when waiting for busy models to become idle (default: 30)

LRU Eviction Retry Interval

Interval between retries when waiting for busy models (e.g., 1s, 2s) (default: 1s)

Memory Reclaimer

Automatically evict backends when memory usage exceeds a threshold. Uses GPU VRAM if available, otherwise system RAM. Uses LRU strategy.

Current Memory StatusRefresh

System RAM

Memory monitoring unavailable

Enable Memory Reclaimer

Evict backends when memory usage exceeds threshold

Memory Threshold (%)

When memory usage exceeds this, backends will be evicted (50-100%)

Backend Request Settings

Configure how backends handle multiple requests

Max Active Backends

Maximum number of models to keep loaded at once (0 = unlimited, 1 = single backend mode). Least recently used models are evicted when limit is reached.

Parallel Backend Requests

Enable backends to handle multiple requests in parallel (if supported)

Performance Settings

Configure default performance parameters for models

Default Threads

Number of threads to use for model inference (0 = auto)

Default Context Size

Default context window size for models

F16 Precision

Use 16-bit floating point precision

Debug Mode

Enable debug logging

Enable Tracing

Enable tracing of requests and responses

Tracing Max Items

Maximum number of tracing items to keep

API Settings

Configure CORS and CSRF protection

Enable CORS

Enable Cross-Origin Resource Sharing

CORS Allow Origins

Comma-separated list of allowed origins

Enable CSRF Protection

Enable Cross-Site Request Forgery protection

P2P Settings

Configure peer-to-peer networking

P2P Token

Authentication token for P2P network (set to 0 to generate a new token)

P2P Network ID

Network identifier for P2P connections

Federated Mode

Enable federated instance mode

Agent Jobs Settings

Configure agent job retention and cleanup

Job Retention Days

Number of days to keep job history (default: 30)

Open Responses Settings

Configure Open Responses API response storage

Response Store TTL

Time-to-live for stored responses (e.g., 1h, 30m, 0 = no expiration)

API Keys

Manage API keys for authentication. Keys from environment variables are always included.

API Keys

List of API keys (one per line or comma-separated)

Note: API keys are sensitive. Handle with care.

Gallery Settings

Configure model and backend galleries

Autoload Galleries

Automatically load model galleries on startup

Autoload Backend Galleries

Automatically load backend galleries on startup

Model Galleries (JSON)

Array of gallery objects with 'url' and 'name' fields

Backend Galleries (JSON)

Array of backend gallery objects with 'url' and 'name' fields