docker/README-packaging.md
This directory contains scripts for building and distributing KTransformers Docker images with standardized naming conventions.
The packaging system provides:
Docker images follow this naming pattern:
sglang-v{sglang版本}_ktransformers-v{ktransformers版本}_{cpu信息}_{gpu信息}_{功能模式}_{时间戳}
Tar file:
sglang-v0.5.6_ktransformers-v0.5.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar
DockerHub tags:
Full tag:
kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.5.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
Simplified tag:
kvcache/ktransformers:v0.5.3-cu128
| Component | Description | Example |
|---|---|---|
| sglang version | SGLang package version | v0.5.6 |
| ktransformers version | KTransformers version | v0.5.3 |
| cpu info | CPU instruction set support | x86-intel-multi (includes AMX/AVX512/AVX2) |
| gpu info | CUDA version | cu128 (CUDA 12.8) |
| functionality | Feature mode | sft_llamafactory-v0.9.3 or infer |
| timestamp | Build time (Beijing/UTC+8) | 20241212143022 |
| File | Purpose |
|---|---|
Dockerfile | Main Dockerfile with multi-CPU build and version extraction |
docker-utils.sh | Shared utility functions for both scripts |
build-docker-tar.sh | Build and export Docker image to tar file |
push-to-dockerhub.sh | Build and push Docker image to DockerHub |
docker login)cd docker
# Basic build
./build-docker-tar.sh
# With specific CUDA version and mirror
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1
# With proxy
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1 \
--http-proxy "http://127.0.0.1:16981" \
--https-proxy "http://127.0.0.1:16981" \
--output-dir /path/to/output
cd docker
# Basic push (requires --repository)
./push-to-dockerhub.sh \
--repository kvcache/ktransformers
# With simplified tag
./push-to-dockerhub.sh \
--cuda-version 12.8.1 \
--repository kvcache/ktransformers \
--also-push-simplified
# Skip build if image exists
./push-to-dockerhub.sh \
--repository kvcache/ktransformers \
--skip-build
Build Configuration:
--cuda-version VERSION CUDA version (default: 12.8.1)
--ubuntu-mirror 0|1 Use Tsinghua mirror (default: 0)
--http-proxy URL HTTP proxy URL
--https-proxy URL HTTPS proxy URL
--cpu-variant VARIANT CPU variant (default: x86-intel-multi)
--functionality TYPE Mode: sft or infer (default: sft)
Paths:
--dockerfile PATH Path to Dockerfile (default: ./Dockerfile)
--context-dir PATH Build context directory (default: .)
--output-dir PATH Output directory for tar (default: .)
Options:
--dry-run Preview without building
--keep-image Keep Docker image after export
--build-arg KEY=VALUE Additional build arguments
-h, --help Show help message
All options from build-docker-tar.sh, plus:
Registry Settings:
--registry REGISTRY Docker registry (default: docker.io)
--repository REPO Repository name (REQUIRED)
Options:
--skip-build Skip build if image exists
--also-push-simplified Also push simplified tag
--max-retries N Max push retries (default: 3)
--retry-delay SECONDS Delay between retries (default: 5)
For testing on your local machine:
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--output-dir ./builds \
--keep-image
This will:
./builds/ directoryFor creating a production build with mirrors and proxy:
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1 \
--http-proxy "http://127.0.0.1:16981" \
--https-proxy "http://127.0.0.1:16981" \
--output-dir /mnt/data/releases
For publishing to DockerHub:
# First, login to Docker Hub
docker login
# Then push
./push-to-dockerhub.sh \
--cuda-version 12.8.1 \
--repository kvcache/ktransformers \
--also-push-simplified
This creates two tags:
kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.5.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022kvcache/ktransformers:v0.5.3-cu128Preview the build without actually building:
./build-docker-tar.sh --cuda-version 12.8.1 --dry-run
Pass additional Docker build arguments:
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--build-arg SGL_VERSION=0.5.7 \
--build-arg FLASHINFER_VERSION=0.5.4
# Load the image
docker load -i sglang-v0.5.6_ktransformers-v0.5.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar
# Run the container
docker run -it --rm \
--gpus all \
sglang-v0.5.6_ktransformers-v0.5.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 \
/bin/bash
# Pull with full tag
docker pull kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.5.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
# Or pull with simplified tag
docker pull kvcache/ktransformers:v0.5.3-cu128
# Run the container
docker run -it --rm \
--gpus all \
kvcache/ktransformers:v0.5.3-cu128 \
/bin/bash
The image contains two conda environments:
# Activate serve environment (for inference with sglang)
conda activate serve
# or use the alias:
serve
# Activate fine-tune environment (for training with LLaMA-Factory)
conda activate fine-tune
# or use the alias:
finetune
The Docker image includes all three CPU variants:
The runtime automatically detects your CPU and loads the appropriate variant. To override:
# Force use of AVX2 variant
export KT_KERNEL_CPU_VARIANT=avx2
python your_script.py
# Enable debug output to see which variant is loaded
export KT_KERNEL_DEBUG=1
python your_script.py
Versions are automatically extracted during Docker build from:
sglang.__version__ in serve environmentversion.py in ktransformers repositoryllamafactory.__version__ in fine-tune environmentThe versions are saved to /workspace/versions.env in the image:
# View versions in running container
cat /workspace/versions.env
# Output:
SGLANG_VERSION=0.5.6
KTRANSFORMERS_VERSION=0.5.3
LLAMAFACTORY_VERSION=0.9.3
Check available disk space:
df -h
The build requires approximately 15-20GB of disk space. Clean up Docker:
docker system prune -a
If version extraction fails (shows "unknown"), check:
You can manually verify by running:
docker run --rm <image> /bin/bash -c "
source /opt/miniconda3/etc/profile.d/conda.sh &&
conda activate serve &&
python -c 'import sglang; print(sglang.__version__)'
"
docker loginkvcache/ktransformers, not just ktransformers)--max-retries and --retry-delay options./build-docker-tar.sh \
--dockerfile /path/to/custom/Dockerfile \
--context-dir /path/to/build/context
Currently, the image always includes both serve and fine-tune environments. To create an inference-only image, modify the Dockerfile to skip the fine-tune environment section.
To build only specific CPU variants, modify kt-kernel/install.sh or set environment variables in the Dockerfile.
The scripts are designed for manual execution but can be integrated into CI/CD pipelines:
# Example GitHub Actions workflow
- name: Build and push Docker image
run: |
cd docker
./push-to-dockerhub.sh \
--cuda-version ${{ matrix.cuda_version }} \
--repository ${{ secrets.DOCKER_REPOSITORY }} \
--also-push-simplified
For issues and questions:
This packaging system is part of KTransformers and follows the same license.