Back to Sglang

Support Models on Ascend NPU

docs/platforms/ascend/ascend_npu_support_models.md

0.5.1115.1 KB
Original Source

Support Models on Ascend NPU

This section describes the models supported on the Ascend NPU, including Large Language Models, Multimodal Language Models, Embedding Models, Reward Models and Rerank Models. Mainstream DeepSeek/Qwen/GLM series are included. You are welcome to enable various models based on your business requirements.

Large Language Models

ModelsModel FamilyA2 SupportedA3 Supported
DeepSeek V3/V3.1DeepSeek<span style="color: green;"></span><span style="color: green;"></span>
DeepSeek-V3.2-W8A8DeepSeek<span style="color: green;"></span><span style="color: green;"></span>
DeepSeek-R1-0528-W8A8DeepSeek<span style="color: green;"></span><span style="color: green;"></span>
DeepSeek-V2-Lite-W8A8DeepSeek<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-30B-A3B-Instruct-2507Qwen<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-32BQwen<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-0.6BQwen<span style="color: green;"></span><span style="color: green;"></span>
Qwen3-235B-A22B-W8A8Qwen<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-Next-80B-A3B-InstructQwen<span style="color: green;"></span><span style="color: green;"></span>
Qwen3-Coder-480B-A35B-Instruct-w8a8-QuaRotQwen<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen2.5-7B-InstructQwen<span style="color: green;"></span><span style="color: green;"></span>
QWQ-32B-W8A8Qwen<span style="color: green;"></span><span style="color: green;"></span>
meta-llama/Llama-4-Scout-17B-16E-InstructLlama<span style="color: green;"></span><span style="color: green;"></span>
AI-ModelScope/Llama-3.1-8B-InstructLlama<span style="color: green;"></span><span style="color: green;"></span>
LLM-Research/llama-2-7bLlama<span style="color: green;"></span><span style="color: green;"></span>
LLM-Research/Llama-3.2-1B-InstructLlama<span style="color: green;"></span><span style="color: green;"></span>
mistralai/Mistral-7B-Instruct-v0.2Mistral<span style="color: green;"></span><span style="color: green;"></span>
google/gemma-3-4b-itGemma<span style="color: green;"></span><span style="color: green;"></span>
microsoft/Phi-4-multimodal-instructPhi<span style="color: green;"></span><span style="color: green;"></span>
allenai/OLMoE-1B-7B-0924OLMoE<span style="color: green;"></span><span style="color: green;"></span>
stabilityai/stablelm-2-1_6bStableLM<span style="color: green;"></span><span style="color: green;"></span>
CohereForAI/c4ai-command-r-v01Command-R<span style="color: green;"></span><span style="color: green;"></span>
huihui-ai/grok-2Grok<span style="color: green;"></span><span style="color: green;"></span>
ZhipuAI/chatglm2-6bChatGLM<span style="color: green;"></span><span style="color: green;"></span>
Shanghai_AI_Laboratory/internlm2-7bInternLM 2<span style="color: green;"></span><span style="color: green;"></span>
LGAI-EXAONE/EXAONE-3.5-7.8B-InstructExaONE 3<span style="color: green;"></span><span style="color: green;"></span>
xverse/XVERSE-MoE-A36BXVERSE<span style="color: green;"></span><span style="color: green;"></span>
HuggingFaceTB/SmolLM-1.7BSmolLM<span style="color: green;"></span><span style="color: green;"></span>
ZhipuAI/glm-4-9b-chatGLM-4<span style="color: green;"></span><span style="color: green;"></span>
XiaomiMiMo/MiMo-7B-RLMiMo<span style="color: green;"></span><span style="color: green;"></span>
arcee-ai/AFM-4.5B-BaseArcee AFM-4.5B<span style="color: green;"></span><span style="color: green;"></span>
Howeee/persimmon-8b-chatPersimmon<span style="color: green;"></span><span style="color: green;"></span>
inclusionAI/Ling-liteLing<span style="color: green;"></span><span style="color: green;"></span>
ibm-granite/granite-3.1-8b-instructGranite<span style="color: green;"></span><span style="color: green;"></span>
ibm-granite/granite-3.0-3b-a800m-instructGranite MoE<span style="color: green;"></span><span style="color: green;"></span>
AI-ModelScope/dbrx-instructDBRX (Databricks)<span style="color: green;"></span><span style="color: green;"></span>
baichuan-inc/Baichuan2-13B-ChatBaichuan 2 (7B, 13B)<span style="color: green;"></span><span style="color: green;"></span>
baidu/ERNIE-4.5-21B-A3B-PTERNIE-4.5 (4.5, 4.5MoE series)<span style="color: green;"></span><span style="color: green;"></span>
OpenBMB/MiniCPM3-4BMiniCPM (v3, 4B)<span style="color: green;"></span><span style="color: green;"></span>
moonshotai/Kimi-K2-ThinkingKimi<span style="color: green;"></span><span style="color: green;"></span>
eigen-ai-labs/gpt-oss-120b-bf16GPTOSS<span style="color: green;"></span><span style="color: green;"></span>
allenai/OLMo-2-1124-7B-InstructOLMo<span style="color: green;"></span><span style="color: green;"></span>
cyankiwi/MiniMax-M2-BF16MiniMax-M2<span style="color: green;"></span><span style="color: green;"></span>
upstage/SOLAR-10.7B-Instruct-v1.0Solar<span style="color: green;"></span><span style="color: green;"></span>
bigcode/starcoder2-7bStarCoder2<span style="color: green;"></span><span style="color: green;"></span>
arcee-ai/Trinity-MiniTrinity (Nano, Mini)<span style="color: green;"></span><span style="color: green;"></span>

Multimodal Language Models

ModelsModel Family (Variants)A2 SupportedA3 Supported
Qwen/Qwen2.5-VL-3B-InstructQwen-VL<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen2.5-VL-72B-InstructQwen-VL<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-VL-30B-A3B-InstructQwen-VL<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-VL-8B-InstructQwen-VL<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-VL-4B-InstructQwen-VL<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-VL-235B-A22B-InstructQwen-VL<span style="color: green;"></span><span style="color: green;"></span>
deepseek-ai/deepseek-vl2DeepSeek-VL2<span style="color: green;"></span><span style="color: green;"></span>
deepseek-ai/Janus-Pro-1BJanus-Pro (1B, 7B)<span style="color: green;"></span><span style="color: green;"></span>
deepseek-ai/Janus-Pro-7BJanus-Pro (1B, 7B)<span style="color: green;"></span><span style="color: green;"></span>
openbmb/MiniCPM-V-2_6MiniCPM-V / MiniCPM-o<span style="color: green;"></span><span style="color: green;"></span>
openbmb/MiniCPM-o-2_6MiniCPM-V / MiniCPM-o<span style="color: green;"></span><span style="color: green;"></span>
google/gemma-3-4b-itGemma 3 (Multimodal)<span style="color: green;"></span><span style="color: green;"></span>
mistralai/Mistral-Small-3.1-24B-Instruct-2503Mistral-Small-3.1-24B<span style="color: green;"></span><span style="color: green;"></span>
microsoft/Phi-4-multimodal-instructPhi-4-multimodal-instruct<span style="color: green;"></span><span style="color: green;"></span>
XiaomiMiMo/MiMo-VL-7B-RLMiMo-VL (7B)<span style="color: green;"></span><span style="color: green;"></span>
AI-ModelScope/llava-v1.6-34bLLaVA (v1.5 & v1.6)<span style="color: green;"></span><span style="color: green;"></span>
lmms-lab/llava-next-72bLLaVA-NeXT (8B, 72B)<span style="color: green;"></span><span style="color: green;"></span>
lmms-lab/llava-onevision-qwen2-7b-ovLLaVA-OneVision<span style="color: green;"></span><span style="color: green;"></span>
moonshotai/Kimi-VL-A3B-InstructKimi-VL (A3B)<span style="color: green;"></span><span style="color: green;"></span>
ZhipuAI/GLM-4.5VGLM-4.5V (106B)<span style="color: green;"></span><span style="color: green;"></span>
LLM-Research/Llama-3.2-11B-Vision-InstructLlama 3.2 Vision (11B)<span style="color: green;"></span><span style="color: green;"></span>
rednote-hilab/dots.ocrDotsVLM-OCR<span style="color: green;"></span><span style="color: green;"></span>

Embedding Models

ModelsModel FamilyA2 SupportedA3 Supported
intfloat/e5-mistral-7b-instructE5 (Llama/Mistral based)<span style="color: green;"></span><span style="color: green;"></span>
iic/gte_Qwen2-1.5B-instructGTE-Qwen2<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen3-Embedding-8BQwen3-Embedding<span style="color: green;"></span><span style="color: green;"></span>
Alibaba-NLP/gme-Qwen2-VL-2B-InstructGME (Multimodal)<span style="color: green;"></span><span style="color: green;"></span>
AI-ModelScope/clip-vit-large-patch14-336CLIP<span style="color: green;"></span><span style="color: green;"></span>
BAAI/bge-large-en-v1.5BGE<span style="color: green;"></span><span style="color: green;"></span>

Reward Models

ModelsModel FamilyA2 SupportedA3 Supported
Skywork/Skywork-Reward-Llama-3.1-8B-v0.2Llama3.1 Reward<span style="color: green;"></span><span style="color: green;"></span>
Shanghai_AI_Laboratory/internlm2-7b-rewardInternLM 2 Reward<span style="color: green;"></span><span style="color: green;"></span>
Qwen/Qwen2.5-Math-RM-72BQwen2.5 Reward - Math<span style="color: green;"></span><span style="color: green;"></span>
Howeee/Qwen2.5-1.5B-apeachQwen2.5 Reward - Sequence<span style="color: green;"></span><span style="color: green;"></span>
AI-ModelScope/Skywork-Reward-Gemma-2-27B-v0.2Gemma 2-27B Reward<span style="color: green;"></span><span style="color: green;"></span>

Rerank Models

ModelsModel FamilyA2 SupportedA3 Supported
BAAI/bge-reranker-v2-m3BGE-Reranker<span style="color: green;"></span><span style="color: green;"></span>