Init attention backend for Intel XPU (#10656)
Co-authored-by: guangyey <guangye.yu@intel.com> Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
This commit is contained in:
@@ -26,6 +26,7 @@ The support matrix is split into two parts: MHA (standard attention) and MLA (mu
|
||||
| **AITER (ROCm)** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
|
||||
| **Wave (ROCm)** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
||||
| **Ascend (NPU)** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
||||
| **Intel XPU** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
|
||||
|
||||
### MLA Backends
|
||||
|
||||
@@ -190,6 +191,13 @@ python3 -m sglang.launch_server \
|
||||
--attention-backend ascend
|
||||
```
|
||||
|
||||
- Intel XPU
|
||||
```bash
|
||||
python3 -m sglang.launch_server \
|
||||
--model meta-llama/Meta-Llama-3.1-8B-Instruct \
|
||||
--attention-backend intel_xpu
|
||||
```
|
||||
|
||||
- Wave
|
||||
```bash
|
||||
python3 -m sglang.launch_server \
|
||||
|
||||
Reference in New Issue
Block a user