Files
Xinyu Dong 7be26ca617 [Bugs] Fix Docs Build Problem (#97)
* [Bugs] Docs fixed

* Update contributing.md

* Update index.md

* fix lua to text

* fix title size
2026-01-10 05:55:40 +08:00

2.7 KiB

Environment Variables

vllm-kunlun uses the following environment variables to configure the system:

Environment Variables *Recommended value* *Function description*
unset XPU_DUMMY_EVENT *Unsets* XPU_DUMMY_EVENT variable, usually done to ensure real XPU events are used for synchronization and performance measurement.
export XPU_VISIBLE_DEVICES 0,1,2,3,4,5,6,7 *Specify visible XPU Devices*. Here, 8 devices (0 to 7) are specified for inference tasks. This is required for multi-card or distributed inference.
export XPU_USE_MOE_SORTED_THRES 1 Enables the Moe Model *Sort Optimization*.Setting to 1 usually enables this performance optimization.
export XFT_USE_FAST_SWIGLU 1 Enables the *Fast SwiGLU Ops*. SwiGLU is a common activation function, and enabling this accelerates model inference.
export XPU_USE_FAST_SWIGLU 1 Enables the *Fast SwiGLU Ops*. Similar to XFT_USE_FAST_SWIGLU, this enables the fast SwiGLU calculation in Fused MoE Fusion Ops.
export XMLIR_CUDNN_ENABLED 1 Enables XMLIR (an intermediate representation/compiler) to use the *cuDNN compatible/optimized path* (which may map to corresponding XPU optimized libraries in the KunlunCore environment).
export XPU_USE_DEFAULT_CTX 1 Sets the XPU to use the default context. Typically used to simplify environment configuration and ensure runtime consistency.
export XMLIR_FORCE_USE_XPU_GRAPH 1 *Forces the enablement of XPU Graph mode.*. This can capture and optimize the model execution graph, significantly boosting inference performance.
export VLLM_HOST_IP $(hostname -i) *Sets the host IP address for the vLLM service*. This uses a shell command to dynamically get the current host's internal IP. It's used for inter-node communication in a distributed environment.
export XMLIR_ENABLE_MOCK_TORCH_COMPILE false *Disable Mock Torch Compile Function*. Set to false to ensure the actual compilation and optimization flow is used, rather than mock mode.
FUSED_QK_ROPE_OP 0 *Control whether to use the Fused QK-Norm and RoPE implementation*. Default is 0 (use original/standard RoPE). Setting to 1 may be used to enable QWEN3.