[Doc] Add model feature matrix table. (#4040)

### What this PR does / why we need it?
Add model feature matrix table.

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: menogrey <1299267905@qq.com>
This commit is contained in:
zhangyiming
2025-11-07 11:28:05 +08:00
committed by GitHub
parent 22286fc67d
commit 46ef280105

View File

@@ -6,78 +6,78 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
### Generative Models ### Generative Models
| Model | Support | Note | | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|-------------------------------|-----------|----------------------------------------------------------------------| |-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
| DeepSeek V3/3.1 | ✅ | | | DeepSeek V3/3.1 | ✅ | |||||||||||||||||||
| DeepSeek V3.2 EXP | ✅ | | | DeepSeek V3.2 EXP | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ❌ | | | 163840 | | [DeepSeek-V3.2-Exp tutorial](../../tutorials/DeepSeek-V3.2-Exp.md) |
| DeepSeek R1 | ✅ | | | DeepSeek R1 | ✅ | |||||||||||||||||||
| DeepSeek Distill (Qwen/LLama) | ✅ | | | DeepSeek Distill (Qwen/LLama) | ✅ | |||||||||||||||||||
| Qwen3 | ✅ | | | Qwen3 | ✅ | |||||||||||||||||||
| Qwen3-based | ✅ | | | Qwen3-based | ✅ | |||||||||||||||||||
| Qwen3-Coder | ✅ | | | Qwen3-Coder | ✅ | |||||||||||||||||||
| Qwen3-Moe | ✅ | | | Qwen3-Moe | ✅ | |||||||||||||||||||
| Qwen3-Next | ✅ | | | Qwen3-Next | ✅ | |||||||||||||||||||
| Qwen2.5 | ✅ | | | Qwen2.5 | ✅ | |||||||||||||||||||
| Qwen2 | ✅ | | | Qwen2 | ✅ | |||||||||||||||||||
| Qwen2-based | ✅ | | | Qwen2-based | ✅ | |||||||||||||||||||
| QwQ-32B | ✅ | | | QwQ-32B | ✅ | |||||||||||||||||||
| LLama2/3/3.1 | ✅ | | | LLama2/3/3.1 | ✅ | |||||||||||||||||||
| Internlm | ✅ | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962) | | Internlm | ✅ | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |||||||||||||||||||
| Baichuan | ✅ | | | Baichuan | ✅ | |||||||||||||||||||
| Baichuan2 | ✅ | | | Baichuan2 | ✅ | |||||||||||||||||||
| Phi-4-mini | ✅ | | | Phi-4-mini | ✅ | |||||||||||||||||||
| MiniCPM | ✅ | | | MiniCPM | ✅ | |||||||||||||||||||
| MiniCPM3 | ✅ | | | MiniCPM3 | ✅ | |||||||||||||||||||
| Ernie4.5 | ✅ | | | Ernie4.5 | ✅ | |||||||||||||||||||
| Ernie4.5-Moe | ✅ | | | Ernie4.5-Moe | ✅ | |||||||||||||||||||
| Gemma-2 | ✅ | | | Gemma-2 | ✅ | |||||||||||||||||||
| Gemma-3 | ✅ | | | Gemma-3 | ✅ | |||||||||||||||||||
| Phi-3/4 | ✅ | | | Phi-3/4 | ✅ | |||||||||||||||||||
| Mistral/Mistral-Instruct | ✅ | | | Mistral/Mistral-Instruct | ✅ | |||||||||||||||||||
| GLM-4.5 | ✅ | | | GLM-4.5 | ✅ | |||||||||||||||||||
| GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) | | GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |||||||||||||||||||
| GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) | | GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |||||||||||||||||||
| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) | | ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |||||||||||||||||||
| DeepSeek V2.5 | 🟡 | Need test | | DeepSeek V2.5 | 🟡 | Need test |||||||||||||||||||
| Mllama | 🟡 | Need test | | Mllama | 🟡 | Need test |||||||||||||||||||
| MiniMax-Text | 🟡 | Need test | | MiniMax-Text | 🟡 | Need test |||||||||||||||||||
### Pooling Models ### Pooling Models
| Model | Support | Note | | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|-------------------------------|-----------|----------------------------------------------------------------------| |-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
| Qwen3-Embedding | ✅ | | | Qwen3-Embedding | ✅ | |||||||||||||||||||
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) | | Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) |||||||||||||||||||
| XLM-RoBERTa-based | ❌ | [1960](https://github.com/vllm-project/vllm-ascend/issues/1960) | | XLM-RoBERTa-based | ❌ | [1960](https://github.com/vllm-project/vllm-ascend/issues/1960) |||||||||||||||||||
## Multimodal Language Models ## Multimodal Language Models
### Generative Models ### Generative Models
| Model | Support | Note | | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|--------------------------------|---------------|----------------------------------------------------------------------| |--------------------------------|---------------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
| Qwen2-VL | ✅ | | | Qwen2-VL | ✅ | |||||||||||||||||||
| Qwen2.5-VL | ✅ | | | Qwen2.5-VL | ✅ | |||||||||||||||||||
| Qwen3-VL | ✅ | | | Qwen3-VL | ✅ | |||||||||||||||||||
| Qwen3-VL-MOE | ✅ | | | Qwen3-VL-MOE | ✅ | |||||||||||||||||||
| Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) | | Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) |||||||||||||||||||
| QVQ | ✅ | | | QVQ | ✅ | |||||||||||||||||||
| LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) | | LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |||||||||||||||||||
| InternVL2 | ✅ | | | InternVL2 | ✅ | |||||||||||||||||||
| InternVL2.5 | ✅ | | | InternVL2.5 | ✅ | |||||||||||||||||||
| Qwen2-Audio | ✅ | | | Qwen2-Audio | ✅ | |||||||||||||||||||
| Aria | ✅ | | | Aria | ✅ | |||||||||||||||||||
| LLaVA-Next | ✅ | | | LLaVA-Next | ✅ | |||||||||||||||||||
| LLaVA-Next-Video | ✅ | | | LLaVA-Next-Video | ✅ | |||||||||||||||||||
| MiniCPM-V | ✅ | | | MiniCPM-V | ✅ | |||||||||||||||||||
| Mistral3 | ✅ | | | Mistral3 | ✅ | |||||||||||||||||||
| Phi-3-Vison/Phi-3.5-Vison | ✅ | | | Phi-3-Vison/Phi-3.5-Vison | ✅ | |||||||||||||||||||
| Gemma3 | ✅ | | | Gemma3 | ✅ | |||||||||||||||||||
| LLama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) | | LLama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |||||||||||||||||||
| LLama3.2 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) | | LLama3.2 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |||||||||||||||||||
| Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) | | Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) |||||||||||||||||||
| Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) | | Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) |||||||||||||||||||
| GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) | | GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) |||||||||||||||||||
| InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) | | InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) |||||||||||||||||||
| Whisper | ❌ | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262) | | Whisper | ❌ | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262) |||||||||||||||||||
| Ultravox | 🟡 | Need test | | Ultravox | 🟡 | Need test |||||||||||||||||||