Init attention backend for Intel XPU (#10656)

Co-authored-by: guangyey <guangye.yu@intel.com>
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
This commit is contained in:
Meng, Hengyu
2025-10-21 11:41:28 +08:00
committed by GitHub
parent fb6cc7b000
commit b113c72e7a
18 changed files with 1210 additions and 26 deletions

View File

@@ -26,6 +26,7 @@ The support matrix is split into two parts: MHA (standard attention) and MLA (mu
| **AITER (ROCm)** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
| **Wave (ROCm)** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Ascend (NPU)** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Intel XPU** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
### MLA Backends
@@ -190,6 +191,13 @@ python3 -m sglang.launch_server \
--attention-backend ascend
```
- Intel XPU
```bash
python3 -m sglang.launch_server \
--model meta-llama/Meta-Llama-3.1-8B-Instruct \
--attention-backend intel_xpu
```
- Wave
```bash
python3 -m sglang.launch_server \