From 62a9fea7afd8e066e8febec518206f0d7592c0f3 Mon Sep 17 00:00:00 2001 From: 1092626063 <1092626063@qq.com> Date: Fri, 12 Dec 2025 15:37:39 +0800 Subject: [PATCH] =?UTF-8?q?=E3=80=90doc=E3=80=91Add=20model=20feature=20ma?= =?UTF-8?q?trix=20(#4950)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What this PR does / why we need it? doc tutorials add model feature matrix: DeepSeekR1 DeepSeekV3.1 Qwen3-Dense Qwen3-Moe Qwen3-Next Qwen2.5 Qwen2.5-VL Qwen3-VL ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: 1092626063 <1092626063@qq.com> --- .../support_matrix/supported_models.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/source/user_guide/support_matrix/supported_models.md b/docs/source/user_guide/support_matrix/supported_models.md index c7077f9a..4f5a65f1 100644 --- a/docs/source/user_guide/support_matrix/supported_models.md +++ b/docs/source/user_guide/support_matrix/supported_models.md @@ -8,16 +8,16 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160 | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc | |-------------------------------|-----------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----| -| DeepSeek V3/3.1 | ✅ | ||||||||||||||||||| -| DeepSeek V3.2 EXP | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ❌ | | | 163840 | | [DeepSeek-V3.2-Exp tutorial](../../tutorials/DeepSeek-V3.2.md) | -| DeepSeek R1 | ✅ | ||||||||||||||||||| +| DeepSeek V3/3.1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 240k || [DeepSeek-V3.1](../../tutorials/DeepSeek-V3.1.md) | +| DeepSeek V3.2 EXP | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ | ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ❌ ||| 160k || [DeepSeek-V3.2](../../tutorials/DeepSeek-V3.2.md) | +| DeepSeek R1 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ || ✅ || ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 128k || [DeepSeek-R1](../../tutorials/DeepSeek-R1.md) | | DeepSeek Distill (Qwen/Llama) | ✅ | ||||||||||||||||||| -| Qwen3 | ✅ | ||||||||||||||||||| +| Qwen3 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ ||| ✅ || ✅ | ✅ | 128k | ✅ | [Qwen3-Dense](../../tutorials/Qwen3-Dense.md) | | Qwen3-based | ✅ | ||||||||||||||||||| -| Qwen3-Coder | ✅ | | A2/A3 |✅||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)| -| Qwen3-Moe | ✅ | ||||||||||||||||||| -| Qwen3-Next | ✅ | ||||||||||||||||||| -| Qwen2.5 | ✅ | ||||||||||||||||||| +| Qwen3-Coder | ✅ | | ✅ | A2/A3 ||✅|✅|✅|||✅|✅|✅|✅||||||[Qwen3-Coder-30B-A3B tutorial](../../tutorials/Qwen3-Coder-30B-A3B.md)| +| Qwen3-Moe | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ || ✅ | ✅ | ✅ | ✅ | ✅ ||| [Qwen3-235B-A22B](../../tutorials/Qwen3-235B-A22B.md) | +| Qwen3-Next | ✅ | | ✅ | A2/A3 | ✅ |||||| ✅ ||| ✅ || ✅ | ✅ ||| [Qwen3-Next](../../tutorials/Qwen3-Next.md) | +| Qwen2.5 | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ |||| ✅ || ✅ | ✅ |||||| [Qwen2.5-7B](../../tutorials/Qwen2.5-7B.md) | | Qwen2 | ✅ | ||||||||||||||||||| | Qwen2-based | ✅ | ||||||||||||||||||| | QwQ-32B | ✅ | ||||||||||||||||||| @@ -58,10 +58,10 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160 | Model | Support | Note | BF16 | Supported Hardware | W8A8 | Chunked Prefill | Automatic Prefix Cache | LoRA | Speculative Decoding | Async Scheduling | Tensor Parallel | Pipeline Parallel | Expert Parallel | Data Parallel | Prefill-decode Disaggregation | Piecewise AclGraph | Fullgraph AclGraph | max-model-len | MLP Weight Prefetch | Doc | |--------------------------------|---------------|----------------------------------------------------------------------|------|--------------------|------|-----------------|------------------------|------|----------------------|------------------|-----------------|-------------------|-----------------|---------------|-------------------------------|--------------------|--------------------|---------------|---------------------|-----| | Qwen2-VL | ✅ | ||||||||||||||||||| -| Qwen2.5-VL | ✅ | ||||||||||||||||||| -| Qwen3-VL | ✅ | ||A2/A3|||||||✅|||||✅|✅|||| +| Qwen2.5-VL | ✅ | | ✅ | A2/A3 | ✅ | ✅ | ✅ ||| ✅ | ✅ |||| ✅ | ✅ | ✅ | 30k || [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) | +| Qwen3-VL | ✅ | ||A2/A3|||||||✅|||||✅|✅||| [Qwen-VL-Dense](../../tutorials/Qwen-VL-Dense.md) | | Qwen3-VL-MOE | ✅ | ||||||||||||||||||| -| Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) ||||||||||||||||||| +| Qwen2.5-Omni | ✅ ||||||||||||||||||| [Qwen2.5-Omni](../../tutorials/Qwen2.5-Omni.md) | | QVQ | ✅ | ||||||||||||||||||| | LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) ||||||||||||||||||| | InternVL2 | ✅ | |||||||||||||||||||