doc/en/Docker_xpu.md
Clone the repository and navigate to the project directory:
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
Build the Docker image using the XPU-specific Dockerfile.xpu:
sudo http_proxy=$HTTP_PROXY \
https_proxy=$HTTPS_PROXY \
docker build \
--build-arg http_proxy=$HTTP_PROXY \
--build-arg https_proxy=$HTTPS_PROXY \
-t kt_xpu:0.3.1 \
-f Dockerfile.xpu \
.
sudo docker run -td --privileged \
--net=host \
--device=/dev/dri \
--shm-size="16g" \
-v /path/to/models:/models \
-e http_proxy=$HTTP_PROXY \
-e https_proxy=$HTTPS_PROXY \
--name ktransformers_xpu \
kt_xpu:0.3.1
Note: Replace /path/to/models with your actual model directory path (e.g., /mnt/models).
sudo docker exec -it ktransformers_xpu /bin/bash
export SYCL_CACHE_PERSISTENT=1
export ONEAPI_DEVICE_SELECTOR=level_zero:0
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
python ktransformers/local_chat.py \
--model_path deepseek-ai/DeepSeek-R1 \
--gguf_path <path_to_gguf_files> \
--optimize_config_path ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml \
--cpu_infer <cpu_cores + 1> \
--device xpu \
--max_new_tokens 200
Note:
<path_to_gguf_files> with the path to your GGUF model files.<cpu_cores + 1> with the number of CPU cores you want to use plus one.For more configuration options and usage details, refer to the project README. To run KTransformers natively on XPU (outside of Docker), please refer to xpu.md.