docs/advanced_features/hicache_storage_runtime_attach_detach.md
This document explains how to dynamically attach/detach the HiCache L3 storage backend at runtime (e.g., mooncake / hf3fs / nixl / file / aibrix / eic) while SGLang is already running and serving traffic, without restarting the process.
For safety and consistency, the current implementation strictly requires these operations to happen only when the service is idle:
If the idle condition is not met, the API will fail fast (HTTP 400) and will not modify the current service state.
The control path is:
python/sglang/srt/entrypoints/http_server.py)
PUT /hicache/storage-backend, DELETE /hicache/storage-backend, GET /hicache/storage-backendpython/sglang/srt/managers/tokenizer_control_mixin.py)
FanOutCommunicatorpython/sglang/srt/managers/scheduler.py)
tree_cache.attach_storage_backend(...) / detach_storage_backend(...)python/sglang/srt/mem_cache/hiradix_cache.py)
hicache_storage_backend_extra_config_json (supports both backend config and prefetch knobs)cache_controller.attach_storage_backend(...) / detach_storage_backend(...)python/sglang/srt/managers/cache_controller.py)
StorageBackendFactory)The Scheduler uses is_fully_idle() which checks:
If the condition is not met, attach/detach returns an error like:
Reject attach: scheduler is not idle. #queue-req=... #running-req=...Tip: before switching, drain upstream traffic and wait for the server to become idle, then call attach/detach.
When dp_size > 1, the tokenizer dispatches the request to all DP scheduler instances and aggregates their responses:
success is true only if all DP ranks return successmessage concatenates messages from all DP ranksThis is intended to prevent “silent partial success”, but it also means you may see:
Currently there is no automatic partial rollback across DP ranks (see TODO in code). Operationally:
The examples below assume your SGLang HTTP server is at http://127.0.0.1:30000.
curl -s http://127.0.0.1:30000/hicache/storage-backend
Example response:
{
"hicache_storage_backend": "mooncake",
"hicache_storage_backend_extra_config": "{\"master_server_address\":\"127.0.0.1:50051\", ...}"
}
curl -s -X PUT http://127.0.0.1:30000/hicache/storage-backend \
-H 'Content-Type: application/json' \
-d '{
"hicache_storage_backend": "mooncake"
}'
curl -s -X PUT http://127.0.0.1:30000/hicache/storage-backend \
-H 'Content-Type: application/json' \
-d '{
"hicache_storage_backend": "mooncake",
"hicache_storage_backend_extra_config_json": "{\"master_server_address\":\"127.0.0.1:50051\",\"protocol\":\"tcp\",\"global_segment_size\":\"4gb\",\"prefetch_threshold\":256}",
"hicache_storage_prefetch_policy": "timeout"
}'
Notes:
hicache_storage_backend_extra_config_json can include both:
prefetch_threshold, prefetch_timeout_base, prefetch_timeout_per_ki_token, hicache_storage_pass_prefix_keys)curl -s -X DELETE http://127.0.0.1:30000/hicache/storage-backend
Notes:
page_first/page_first_direct/page_head; if the server's HiCache host-memory layout does not satisfy the backend requirements, attach will fail with an errorserver_args.hicache_storage_backend* is updated on both the tokenizer and scheduler sidesHiRadixCache on demand