docs/training/weight_transfer/ipc.md
The IPC weight transfer engine uses CUDA IPC (Inter-Process Communication) handles to share GPU memory directly between the trainer and inference workers on the same node and same GPU. This avoids any data copying, making it a efficient option when colocating training and inference.
torch.multiprocessing.reductions.reduce_tensor.!!! warning
IPC handles involve sending serialized Python objects. When using HTTP transport, you must set VLLM_ALLOW_INSECURE_SERIALIZATION=1 on both the server and client. This is because IPC handles are pickled and base64-encoded for HTTP transmission.
The IPC backend requires no initialization on either side. The init_transfer_engine call is a no-op for IPC.
IPC supports two transport modes for delivering the handles:
Used when vLLM is running as a Ray actor:
from vllm.distributed.weight_transfer.ipc_engine import (
IPCTrainerSendWeightsArgs,
IPCWeightTransferEngine,
)
trainer_args = IPCTrainerSendWeightsArgs(
mode="ray",
llm_handle=llm_actor_handle,
)
# start
ray.get(llm_actor_handle.start_weight_update.remote(is_checkpoint_format=True))
# send weights
IPCWeightTransferEngine.trainer_send_weights(
iterator=model.named_parameters(),
trainer_args=trainer_args,
)
# finish
ray.get(llm_actor_handle.finish_weight_update.remote())
In Ray mode, the engine calls llm_handle.update_weights.remote(...) directly, passing the IPC handles via Ray's serialization.
Used when vLLM is running as an HTTP server:
trainer_args = IPCTrainerSendWeightsArgs(
mode="http",
url="http://localhost:8000",
)
# start
base_url = "http://localhost:8000"
url = f"{base_url}/start_weight_update"
response = requests.post(url, json={"is_checkpoint_format": True}, timeout=60)
response.raise_for_status()
# send weights
IPCWeightTransferEngine.trainer_send_weights(
iterator=model.named_parameters(),
trainer_args=trainer_args,
)
# finish
url = f"{base_url}/finish_weight_update"
response = requests.post(url, json={}, timeout=60)
response.raise_for_status()
In HTTP mode, IPC handles are pickled, base64-encoded, and sent as JSON to the /update_weights endpoint. As with Ray mode, you must call start_weight_update before and finish_weight_update after.
See IPCTrainerSendWeightsArgs for the full list of configurable fields.