xc-llm-kunlun/docs/source/user_guide/configuration/env_vars.md

# Environment Variables

vllm-kunlun uses the following environment variables to configure the system:

| *Environment Variables*                     | ***\*Recommended value\****  | ***\*Function description\****                                           |
| ---------------------------------------- | ----------------- | ------------------------------------------------------------ |
| `unset XPU_DUMMY_EVENT`                  |                   | ***\*Unsets\**** `XPU_DUMMY_EVENT` variable, usually done to ensure real XPU events are used for synchronization and performance measurement. |
| `export XPU_VISIBLE_DEVICES`             | `0,1,2,3,4,5,6,7` | ***\*Specify visible XPU Devices\****. Here, 8 devices (0 to 7) are specified for inference tasks. This is required for multi-card or distributed inference. |
| `export XPU_USE_MOE_SORTED_THRES`        | `1`               | Enables the Moe Model ***\*Sort Optimization\****.Setting to `1` usually enables this performance optimization. |
| `export XFT_USE_FAST_SWIGLU`             | `1`               | Enables the ***\*Fast SwiGLU Ops\****. SwiGLU is a common activation function, and enabling this accelerates model inference. |
| `export XPU_USE_FAST_SWIGLU`             | `1`               | Enables the ***\*Fast SwiGLU Ops\****. Similar to `XFT_USE_FAST_SWIGLU`, this enables the fast SwiGLU calculation in Fused MoE Fusion Ops. |
| `export XMLIR_CUDNN_ENABLED`             | `1`               | Enables XMLIR (an intermediate representation/compiler) to use the ***\*cuDNN compatible/optimized path\**** (which may map to corresponding XPU optimized libraries in the KunlunCore environment). |
| `export XPU_USE_DEFAULT_CTX`             | `1`               | Sets the XPU to use the default context. Typically used to simplify environment configuration and ensure runtime consistency. |
| `export XMLIR_FORCE_USE_XPU_GRAPH`       | `1`               | ***\*Forces the enablement of XPU Graph mode.\****. This can capture and optimize the model execution graph, significantly boosting inference performance. |
| `export VLLM_HOST_IP`                    | `$(hostname -i)`  | ***\*Sets the host IP address for the vLLM service\****. This uses a shell command to dynamically get the current host's internal IP. It's used for inter-node communication in a distributed environment. |
| `export XMLIR_ENABLE_MOCK_TORCH_COMPILE` | `false`           | ***\*Disable Mock Torch Compile Function\****. Set to `false` to ensure the actual compilation and optimization flow is used, rather than mock mode. |
| `USE_ORI_ROPE`                           | `1`               | ***\*Control whether to use the original RoPE (Rotate Position Encoding) implementation\****. Default is `1` (use original/standard RoPE). Setting to `0` may be used to enable QWEN3 (possibly the specific quantization or optimization technique of KunlunCore), but this requires specific model support. |