[Doc] update supported models (#5379)

### What this PR does / why we need it?
1. update supported models: Llama2 & Kimi-K2-Thinking & ERNIE-4.5 &
Qwen3-Omni
2. update Supported Hardware

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
zhangxinyuehfad
2026-01-05 09:21:52 +08:00
committed by GitHub
parent 42774df744
commit a099b994b3

View File

@@ -6,51 +6,52 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
### Generative Models ### Generative Models
| Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Context Parallel | Doc | | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|-----| |-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
| DeepSeek V3/3.1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || ✅ | [DeepSeek-V3.1](../../tutorials/DeepSeek-V3.1.md) | | DeepSeek V3/3.1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || [DeepSeek-V3.1](../../tutorials/DeepSeek-V3.1.md) |
| DeepSeek V3.2 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 160k | ✅ || [DeepSeek-V3.2](../../tutorials/DeepSeek-V3.2.md) | | DeepSeek V3.2 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 160k | ✅ | [DeepSeek-V3.2](../../tutorials/DeepSeek-V3.2.md) |
| DeepSeek R1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || ✅ | [DeepSeek-R1](../../tutorials/DeepSeek-R1.md) | | DeepSeek R1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || [DeepSeek-R1](../../tutorials/DeepSeek-R1.md) |
| DeepSeek Distill (Qwen/Llama) | ✅ | |||||||||||||||||||| | DeepSeek Distill (Qwen/Llama) | ✅ | || A2/A3 |||||||||||||||||
| Qwen3 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | ✅ | [Qwen3-Dense](../../tutorials/Qwen3-Dense.md) | | Qwen3 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | [Qwen3-Dense](../../tutorials/Qwen3-Dense.md) |
| Qwen3-based | ✅ | |||||||||||||||||||| | Qwen3-based | ✅ | || A2/A3 |||||||||||||||||
| Qwen3-Coder | ✅ | | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅|||||| ✅ | [Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)| | Qwen3-Coder | ✅ | | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)|
| Qwen3-Moe | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | 256k || ✅ | [Qwen3-235B-A22B](../../tutorials/Qwen3-235B-A22B.md) | | Qwen3-Moe | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | 256k || [Qwen3-235B-A22B](../../tutorials/Qwen3-235B-A22B.md) |
| Qwen3-Next | ✅ | | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ |||| [Qwen3-Next](../../tutorials/Qwen3-Next.md) | | Qwen3-Next | ✅ | | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ ||| [Qwen3-Next](../../tutorials/Qwen3-Next.md) |
| Qwen2.5 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ ||||||| [Qwen2.5-7B](../../tutorials/Qwen2.5-7B.md) | | Qwen2.5 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ ||| ✅ |||||| [Qwen2.5-7B](../../tutorials/Qwen2.5-7B.md) |
| Qwen2 | ✅ | |||||||||||||||||||| | Qwen2 | ✅ | || A2/A3 |||||||||||||||||
| Qwen2-based | ✅ | |||||||||||||||||||| | Qwen2-based | ✅ | || A2/A3 |||||||||||||||||
| QwQ-32B | ✅ | |||||||||||||||||||| | QwQ-32B | ✅ | || A2/A3 |||||||||||||||||
| Llama2/3/3.1 | ✅ | |||||||||||||||||||| | Llama2/3/3.1/3.2 | ✅ | || A2/A3 |||||||||||||||||
| Internlm | ✅ | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |||||||||||||||||||| | Internlm | ✅ | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962) || A2/A3 |||||||||||||||||
| Baichuan | ✅ | |||||||||||||||||||| | Baichuan | ✅ | || A2/A3 |||||||||||||||||
| Baichuan2 | ✅ | |||||||||||||||||||| | Baichuan2 | ✅ | || A2/A3 |||||||||||||||||
| Phi-4-mini | ✅ | |||||||||||||||||||| | Phi-4-mini | ✅ | || A2/A3 |||||||||||||||||
| MiniCPM | ✅ | |||||||||||||||||||| | MiniCPM | ✅ | || A2/A3 |||||||||||||||||
| MiniCPM3 | ✅ | |||||||||||||||||||| | MiniCPM3 | ✅ | || A2/A3 |||||||||||||||||
| Ernie4.5 | ✅ | |||||||||||||||||||| | Ernie4.5 | ✅ | || A2/A3 |||||||||||||||||
| Ernie4.5-Moe | ✅ | |||||||||||||||||||| | Ernie4.5-Moe | ✅ | || A2/A3 |||||||||||||||||
| Gemma-2 | ✅ | |||||||||||||||||||| | Gemma-2 | ✅ | || A2/A3 |||||||||||||||||
| Gemma-3 | ✅ | |||||||||||||||||||| | Gemma-3 | ✅ | || A2/A3 |||||||||||||||||
| Phi-3/4 | ✅ | |||||||||||||||||||| | Phi-3/4 | ✅ | || A2/A3 |||||||||||||||||
| Mistral/Mistral-Instruct | ✅ | |||||||||||||||||||| | Mistral/Mistral-Instruct | ✅ | || A2/A3 |||||||||||||||||
| GLM-4.5 | ✅ | |||||||||||||||||||| | GLM-4.5 | ✅ | || A2/A3 |||||||||||||||||
| GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |||||||||||||||||||| | Kimi-K2-Thinking | ✅ | || A2/A3 |||||||||||||||| [Kimi-K2-Thinking](../../tutorials/Kimi-K2-Thinking.md) |
| GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |||||||||||||||||||| | GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |||||||||||||||||||
| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |||||||||||||||||||| | GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |||||||||||||||||||
| DeepSeek V2.5 | 🟡 | Need test |||||||||||||||||||| | ChatGLM | | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |||||||||||||||||||
| Mllama | 🟡 | Need test |||||||||||||||||||| | DeepSeek V2.5 | 🟡 | Need test |||||||||||||||||||
| MiniMax-Text | 🟡 | Need test |||||||||||||||||||| | Mllama | 🟡 | Need test |||||||||||||||||||
| MiniMax-Text | 🟡 | Need test |||||||||||||||||||
### Pooling Models ### Pooling Models
| Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc | | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----| |-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
| Qwen3-Embedding | ✅ | ||||||||||||||||||| | Qwen3-Embedding | ✅ | || A2/A3 |||||||||||||||| [Qwen3_embedding](../../tutorials/Qwen3_embedding.md)|
| Qwen3-Reranker | ✅ | ||||||||||||||||||| | Qwen3-Reranker | ✅ | || A2/A3 |||||||||||||||| [Qwen3_reranker](../../tutorials/Qwen3_reranker.md)|
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) ||||||||||||||||||| | Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) || A2/A3 |||||||||||||||||
| XLM-RoBERTa-based | ✅ | ||||||||||||||||||| | XLM-RoBERTa-based | ✅ | || A2/A3 |||||||||||||||||
| Bert | ✅ | ||||||||||||||||||| | Bert | ✅ | || A2/A3 |||||||||||||||||
## Multimodal Language Models ## Multimodal Language Models
@@ -58,22 +59,23 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160
| Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc | | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc |
|--------------------------------|---------------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----| |--------------------------------|---------------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----|
| Qwen2-VL | ✅ | ||||||||||||||||||| | Qwen2-VL | ✅ | || A2/A3 |||||||||||||||||
| Qwen2.5-VL | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ |||| ✅ | ✅ | ✅ | 30k || [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) | | Qwen2.5-VL | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ |||| ✅ | ✅ | ✅ | 30k || [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) |
| Qwen3-VL | ✅ | ||A2/A3|||||||✅|||||✅|✅||| [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) | | Qwen3-VL | ✅ | ||A2/A3|||||||✅|||||✅|✅||| [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) |
| Qwen3-VL-MOE | ✅ | | ✅ |A2/A3||✅|✅|||✅|✅|✅|✅|✅|✅|✅|✅|256k||[Qwen3-VL-235B-A22B-Instruct](../../tutorials/Qwen3-VL-235B-A22B-Instruct.md)| | Qwen3-VL-MOE | ✅ | | ✅ | A2/A3||✅|✅|||✅|✅|✅|✅|✅|✅|✅|✅|256k||[Qwen3-VL-235B-A22B-Instruct](../../tutorials/Qwen3-VL-235B-A22B-Instruct.md)|
| Qwen2.5-Omni | ✅ ||||||||||||||||||| [Qwen2.5-Omni](../../tutorials/Qwen2.5-Omni.md) | | Qwen2.5-Omni | ✅ | || A2/A3 |||||||||||||||| [Qwen2.5-Omni](../../tutorials/Qwen2.5-Omni.md) |
| QVQ | ✅ | ||||||||||||||||||| | Qwen3-Omni | ✅ | || A2/A3 |||||||||||||||||
| Qwen2-Audio | ✅ | ||||||||||||||||||| | QVQ | ✅ | || A2/A3 |||||||||||||||||
| Aria | ✅ | ||||||||||||||||||| | Qwen2-Audio | ✅ | || A2/A3 |||||||||||||||||
| LLaVA-Next | ✅ | ||||||||||||||||||| | Aria | ✅ | || A2/A3 |||||||||||||||||
| LLaVA-Next-Video | ✅ | ||||||||||||||||||| | LLaVA-Next | ✅ | || A2/A3 |||||||||||||||||
| MiniCPM-V | ✅ | ||||||||||||||||||| | LLaVA-Next-Video | ✅ | || A2/A3 |||||||||||||||||
| Mistral3 | ✅ | ||||||||||||||||||| | MiniCPM-V | ✅ | || A2/A3 |||||||||||||||||
| Phi-3-Vision/Phi-3.5-Vision | ✅ | ||||||||||||||||||| | Mistral3 | ✅ | || A2/A3 |||||||||||||||||
| Gemma3 | ✅ | ||||||||||||||||||| | Phi-3-Vision/Phi-3.5-Vision | ✅ | || A2/A3 |||||||||||||||||
| Gemma3 | ✅ | || A2/A3 |||||||||||||||||
| Llama3.2 | ✅ | || A2/A3 |||||||||||||||||
| Llama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) ||||||||||||||||||| | Llama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |||||||||||||||||||
| Llama3.2 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |||||||||||||||||||
| Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) ||||||||||||||||||| | Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) |||||||||||||||||||
| Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) ||||||||||||||||||| | Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) |||||||||||||||||||
| GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) ||||||||||||||||||| | GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) |||||||||||||||||||