docs_new/docs/advanced_features/rfork.mdx
R-Fork (Tensor Remote Fork) is a novel weight loading methodology that leverages efficient inter-node GPU-to-GPU data transfer path to load tensors from a running SGLang instance to a new instance with zero-copy. It can significantly optimize the SGLang instance boot-up time by reducing model weights loading from several minutes to mere seconds.
To learn more details about R-Fork, please check <a href="https://lmsys.org/blog/2025-12-10-rfork/"> R-Fork blog </a>
seed instance:
python -m sglang.launch_server [args]
client instance:
python -m sglang.launch_server [args] \
--load-format remote_instance \
--remote-instance-weight-loader-seed-instance-ip [seed_instance_ip] \
--remote-instance-weight-loader-seed-instance-service-port [seed_instance_service_port] \
--remote-instance-weight-loader-send-weights-group-ports [send_weights_nccl_group_ports_list] \
--remote-instance-weight-loader-backend nccl
seed instance:
python -m sglang.launch_server [args] \
--remote-instance-weight-loader-start-seed-via-transfer-engine
python -m sglang.launch_server [args] \
--load-format remote_instance \
--remote-instance-weight-loader-seed-instance-ip [seed_instance_ip] \
--remote-instance-weight-loader-seed-instance-service-port [seed_instance_service_port] \
--remote-instance-weight-loader-backend transfer_engine
ModelExpress is a coordination service that manages P2P weight transfer metadata. It removes the need for direct seed IP/port configuration by providing a centralized registry that instances publish to and discover from. The ModelExpress Python package must be installed in the SGLang image.
A running ModelExpress server is required. See the ModelExpress documentation for setup instructions.
server instance:
python -m sglang.launch_server [args] \
--load-format remote_instance \
--remote-instance-weight-loader-backend modelexpress \
--modelexpress-config '{"url": "[modelexpress_grpc_host:port]", "transport": "nixl"}'
All SGLang instances use the same command shape. If no ready source exists, the instance loads weights natively and publishes metadata to ModelExpress. If a compatible source exists, it loads weights through ModelExpress P2P transfer. Set <code>"transport": "transfer_engine"</code> to use Mooncake TransferEngine instead of the default NIXL transport.