Back to Sglang

Environment Variables

docs/platforms/ascend/ascend_npu_environment_variables.md

0.5.116.8 KB
Original Source

Environment Variables

SGLang supports various environment variables related to Ascend NPU that can be used to configure its runtime behavior. This document provides a list of commonly used environment variables and aims to stay updated over time.

Directly Used in SGLang

Environment VariableDescriptionDefault Value
SGLANG_NPU_USE_MLAPOAdopts the MLAPO fusion operator in attention
preprocessing stage of the MLA model.false
SGLANG_USE_FIA_NZReshapes KV Cache for FIA NZ format.
SGLANG_USE_FIA_NZ must be enabled with SGLANG_NPU_USE_MLAPOfalse
SGLANG_NPU_USE_MULTI_STREAMEnable dual-stream computation of shared experts
and routing experts in DeepSeek models.
Enable dual-stream computation in DeepSeek NSA Indexer.false
SGLANG_NPU_DISABLE_ACL_FORMAT_WEIGHTDisable cast model weight tensor to a specific NPU
ACL format.false
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANKThe maximum number of dispatched tokens on each rank.128

Used in DeepEP Ascend

Environment VariableDescriptionDefault Value
DEEPEP_NORMAL_LONG_SEQ_PER_ROUND_TOKENSEnable ant-moving function in dispatch stage. Indicates
the number of tokens transmitted per round on each rank.8192
DEEPEP_NORMAL_LONG_SEQ_ROUNDEnable ant-moving function in dispatch stage. Indicates
the number of rounds transmitted on each rank.1
DEEPEP_NORMAL_COMBINE_ENABLE_LONG_SEQEnable ant-moving function in combine stage.
The value 0 means disabled.0
MOE_ENABLE_TOPK_NEG_ONENeeds to be enabled when the expert ID to be processed by
DEEPEP contains -1.0
DEEP_NORMAL_MODE_USE_INT8_QUANTQuantizes x to int8 and returns (tensor, scales) in dispatch operator.0

Others

Environment VariableDescriptionDefault Value
TASK_QUEUE_ENABLEUsed to control the optimization level of the dispatch queue
about the task_queue operator. Detail1
INF_NAN_MODE_ENABLEControls whether the chip uses saturation mode or INF_NAN mode. Detail1
STREAMS_PER_DEVICEConfigures the maximum number of streams for the stream pool. Detail32
PYTORCH_NPU_ALLOC_CONFControls the behavior of the cache allocator.
This variable changes memory usage and may cause performance fluctuations. Detail
ASCEND_MF_STORE_URLThe address of config store in MemFabric during PD separation,
which is generally set to the IP address of the P primary node
with an arbitrary port number.
ASCEND_LAUNCH_BLOCKINGControls whether synchronous mode is enabled during operator execution. Detail0
HCCL_OP_EXPANSION_MODEConfigures the expansion position for communication algorithm scheduling. Detail
HCCL_BUFFSIZEControls the size of the buffer area for shared data between two NPUs.
The unit is MB, and the value must be greater than or equal to 1. Detail200
HCCL_SOCKET_IFNAMEConfigures the name of the network card used by the Host
during HCCL initialization. Detail
GLOO_SOCKET_IFNAMEConfigures the network interface name for GLOO communication.