Files
xc-llm-kunlun/docs/source/user_guide/configuration/env_vars.md
2025-12-10 17:51:24 +08:00

2.9 KiB

Environment Variables

vllm-kunlun uses the following environment variables to configure the system:

Environment Variables *Recommended value* *Function description*
unset XPU_DUMMY_EVENT *Unsets* XPU_DUMMY_EVENT variable, usually done to ensure real XPU events are used for synchronization and performance measurement.
export XPU_VISIBLE_DEVICES 0,1,2,3,4,5,6,7 *Specify visible XPU Devices*. Here, 8 devices (0 to 7) are specified for inference tasks. This is required for multi-card or distributed inference.
export XPU_USE_MOE_SORTED_THRES 1 Enables the Moe Model *Sort Optimization*.Setting to 1 usually enables this performance optimization.
export XFT_USE_FAST_SWIGLU 1 Enables the *Fast SwiGLU Ops*. SwiGLU is a common activation function, and enabling this accelerates model inference.
export XPU_USE_FAST_SWIGLU 1 Enables the *Fast SwiGLU Ops*. Similar to XFT_USE_FAST_SWIGLU, this enables the fast SwiGLU calculation in Fused MoE Fusion Ops.
export XMLIR_CUDNN_ENABLED 1 Enables XMLIR (an intermediate representation/compiler) to use the *cuDNN compatible/optimized path* (which may map to corresponding XPU optimized libraries in the KunlunCore environment).
export XPU_USE_DEFAULT_CTX 1 Sets the XPU to use the default context. Typically used to simplify environment configuration and ensure runtime consistency.
export XMLIR_FORCE_USE_XPU_GRAPH 1 *Forces the enablement of XPU Graph mode.*. This can capture and optimize the model execution graph, significantly boosting inference performance.
export VLLM_HOST_IP $(hostname -i) *Sets the host IP address for the vLLM service*. This uses a shell command to dynamically get the current host's internal IP. It's used for inter-node communication in a distributed environment.
export XMLIR_ENABLE_MOCK_TORCH_COMPILE false *Disable Mock Torch Compile Function*. Set to false to ensure the actual compilation and optimization flow is used, rather than mock mode.
USE_ORI_ROPE 1 *Control whether to use the original RoPE (Rotate Position Encoding) implementation*. Default is 1 (use original/standard RoPE). Setting to 0 may be used to enable QWEN3 (possibly the specific quantization or optimization technique of KunlunCore), but this requires specific model support.