提交vllm0.11.0开发分支

This commit is contained in:
chenyili
2025-12-10 17:51:24 +08:00
parent deab7dd0b6
commit 7c22d621fb
175 changed files with 31856 additions and 8683 deletions

View File

@@ -14,4 +14,4 @@ vllm-kunlun uses the following environment variables to configure the system:
| `export XMLIR_FORCE_USE_XPU_GRAPH` | `1` | ***\*Forces the enablement of XPU Graph mode.\****. This can capture and optimize the model execution graph, significantly boosting inference performance. |
| `export VLLM_HOST_IP` | `$(hostname -i)` | ***\*Sets the host IP address for the vLLM service\****. This uses a shell command to dynamically get the current host's internal IP. It's used for inter-node communication in a distributed environment. |
| `export XMLIR_ENABLE_MOCK_TORCH_COMPILE` | `false` | ***\*Disable Mock Torch Compile Function\****. Set to `false` to ensure the actual compilation and optimization flow is used, rather than mock mode. |
| `FUSED_QK_ROPE_OP` | `0` | ***\*Control whether to use the Fused QK-Norm and RoPE implementation\****. Default is `0` (use original/standard RoPE). Setting to `1` may be used to enable QWEN3. |
| `USE_ORI_ROPE` | `1` | ***\*Control whether to use the original RoPE (Rotate Position Encoding) implementation\****. Default is `1` (use original/standard RoPE). Setting to `0` may be used to enable QWEN3 (possibly the specific quantization or optimization technique of KunlunCore), but this requires specific model support. |