examples/skypilot/README.md
Run verl reinforcement learning training jobs on Kubernetes clusters or cloud platforms with GPU nodes using SkyPilot.
Choose the installation based on your target platform:
# For Kubernetes only
pip install "skypilot[kubernetes]"
# For AWS
pip install "skypilot[aws]"
# For Google Cloud Platform
pip install "skypilot[gcp]"
# For Azure
pip install "skypilot[azure]"
# For multiple platforms
pip install "skypilot[kubernetes,aws,gcp,azure]"
See https://docs.skypilot.co/en/latest/getting-started/installation.html
Export necessary API keys for experiment tracking:
# For Weights & Biases tracking
export WANDB_API_KEY="your-wandb-api-key"
# For HuggingFace gated models (if needed)
export HF_TOKEN="your-huggingface-token"
sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y
Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ../ppo_trainer/.
sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y
Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ../grpo_trainer/.
sky launch -c verl-multiturn verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ../sglang_multiturn/ but uses vLLM instead of sglang.
The example YAML files are pre-configured with:
infra: k8s) - can be changed to infra: aws or infra: gcp, etc.--secret WANDB_API_KEY--secret HF_TOKEN-c <name>: Cluster name for managing the job--secret KEY: Pass secrets for API keys (can be used multiple times)-y: Skip confirmation promptsky status
sky logs verl-ppo # View logs for the PPO job
ssh verl-ppo
sky status --endpoint 8265 verl-ppo # Get dashboard URL
sky down verl-ppo