v0.10.1rc1

2025-09-09 09:40:35 +08:00
parent d6f6ef41fe
commit 9149384e03
432 changed files with 84698 additions and 1 deletions
--- a/docs/source/user_guide/support_matrix/index.md
+++ b/docs/source/user_guide/support_matrix/index.md
@@ -0,0 +1,10 @@
+# Features and models
+
+This section provides a detailed supported matrix by vLLM Ascend.
+
+:::{toctree}
+:caption: Support Matrix
+:maxdepth: 1
+supported_models
+supported_features
+:::
--- a/docs/source/user_guide/support_matrix/supported_features.md
+++ b/docs/source/user_guide/support_matrix/supported_features.md
@@ -0,0 +1,45 @@
+# Feature Support
+
+The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
+
+You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:
+
+| Feature                       |      Status    | Next Step                                                              |
+|-------------------------------|----------------|------------------------------------------------------------------------|
+| Chunked Prefill               | 🟢 Functional  | Functional, see detail note: [Chunked Prefill][cp]                     |
+| Automatic Prefix Caching      | 🟢 Functional  | Functional, see detail note: [vllm-ascend#732][apc]                    |
+| LoRA                          | 🟢 Functional  | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora]          |
+| Speculative decoding          | 🟢 Functional  | Basic support                                                          |
+| Pooling                       | 🟢 Functional  | CI needed and adapting more models; V1 support rely on vLLM support.   |
+| Enc-dec                       | 🟡 Planned     | vLLM should support this feature first.                                |
+| Multi Modality                | 🟢 Functional  | [Tutorial][multimodal], optimizing and adapting more models            |
+| LogProbs                      | 🟢 Functional  | CI needed                                                              |
+| Prompt logProbs               | 🟢 Functional  | CI needed                                                              |
+| Async output                  | 🟢 Functional  | CI needed                                                              |
+| Beam search                   | 🟢 Functional  | CI needed                                                              |
+| Guided Decoding               | 🟢 Functional  | [vllm-ascend#177][guided_decoding]                                     |
+| Tensor Parallel               | 🟢 Functional  | Make TP >4 work with graph mode                                        |
+| Pipeline Parallel             | 🟢 Functional  | Write official guide and tutorial.                                     |
+| Expert Parallel               | 🟢 Functional  | Dynamic EPLB support.                                                  |
+| Data Parallel                 | 🟢 Functional  | Data Parallel support for Qwen3 MoE.                                   |
+| Prefill Decode Disaggregation | 🟢 Functional  | Functional, xPyD is supported.                                         |
+| Quantization                  | 🟢 Functional  | W8A8 available; working on more quantization method support(W4A8, etc) |
+| Graph Mode                    | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode]           |
+| Sleep Mode                    | 🟢 Functional  |                                                                        |
+
+- 🟢 Functional: Fully operational, with ongoing optimizations.
+- 🔵 Experimental: Experimental support, interfaces and functions may change.
+- 🚧 WIP: Under active development, will be supported soon.
+- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
+- 🔴 NO plan / Deprecated: No plan or deprecated by vLLM.
+
+[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
+[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
+[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
+[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
+[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
+[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767
+[apc]: https://github.com/vllm-project/vllm-ascend/issues/732
+[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
+[1P1D]: https://github.com/vllm-project/vllm-ascend/pull/950
+[ray]: https://github.com/vllm-project/vllm-ascend/issues/1751
--- a/docs/source/user_guide/support_matrix/supported_models.md
+++ b/docs/source/user_guide/support_matrix/supported_models.md
@@ -0,0 +1,79 @@
+# Model Support
+
+Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
+
+## Text-only Language Models
+
+### Generative Models
+
+| Model                         | Supported | Note                                                                 |
+|-------------------------------|-----------|----------------------------------------------------------------------|
+| DeepSeek v3                   | ✅        |                                                                      |
+| DeepSeek R1                   | ✅        |                                                                      |
+| DeepSeek Distill (Qwen/LLama) | ✅        |                                                                      |
+| Qwen3                         | ✅        |                                                                      |
+| Qwen3-based                   | ✅        |                                                                      |
+| Qwen3-Coder                   | ✅        |                                                                      |
+| Qwen3-Moe                     | ✅        |                                                                      |
+| Qwen2.5                       | ✅        |                                                                      |
+| Qwen2                         | ✅        |                                                                      |
+| Qwen2-based                   | ✅        |                                                                      |
+| QwQ-32B                       | ✅        |                                                                      |
+| LLama2/3/3.1                  | ✅        |                                                                      |
+| Internlm                      | ✅        | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962)     |
+| Baichuan                      | ✅        |                                                                      |
+| Baichuan2                     | ✅        |                                                                      |
+| Phi-4-mini                    | ✅        |                                                                      |
+| MiniCPM                       | ✅        |                                                                      |
+| MiniCPM3                      | ✅        |                                                                      |
+| Ernie4.5                      | ✅        |                                                                      |
+| Ernie4.5-Moe                  | ✅        |                                                                      |
+| Gemma-2                       | ✅        |                                                                      |
+| Gemma-3                       | ✅        |                                                                      |
+| Phi-3/4                       | ✅        |                                                                      |
+| Mistral/Mistral-Instruct      | ✅        |                                                                      |
+| GLM-4.5                       | ✅            |                                                                  |
+| GLM-4                         | ❌        | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255)     |
+| GLM-4-0414                    | ❌        | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258)     |
+| ChatGLM                       | ❌        | [#554](https://github.com/vllm-project/vllm-ascend/issues/554)       |
+| DeepSeek v2.5                 | 🟡        | Need test                                                            |
+| Mllama                        | 🟡        | Need test                                                            |
+| MiniMax-Text                  | 🟡        | Need test                                                            |
+
+### Pooling Models
+
+| Model                         | Supported | Note                                                                 |
+|-------------------------------|-----------|----------------------------------------------------------------------|
+| Qwen3-Embedding               | ✅        |                                                                      |
+| Molmo                         | ✅        | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942)      |
+| XLM-RoBERTa-based             | ❌        | [1960](https://github.com/vllm-project/vllm-ascend/issues/1960)      |
+
+## Multimodal Language Models
+
+### Generative Models
+
+| Model                          | Supported     | Note                                                                 |
+|--------------------------------|---------------|----------------------------------------------------------------------|
+| Qwen2-VL                       | ✅            |                                                                      |
+| Qwen2.5-VL                     | ✅            |                                                                      |
+| Qwen2.5-Omni                   | ✅            | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760)      |
+| QVQ                            | ✅            |                                                                      |
+| LLaVA 1.5/1.6                  | ✅            | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962)      |
+| InternVL2                      | ✅            |                                                                      |
+| InternVL2.5                    | ✅            |                                                                      |
+| Qwen2-Audio                    | ✅            |                                                                      |
+| Aria                           | ✅            |                                                                      |
+| LLaVA-Next                     | ✅            |                                                                      |
+| LLaVA-Next-Video               | ✅            |                                                                      |
+| MiniCPM-V                      | ✅            |                                                                      |
+| Mistral3                       | ✅            |                                                                      |
+| Phi-3-Vison/Phi-3.5-Vison      | ✅            |                                                                      |
+| Gemma3                         | ✅            |                                                                      |
+| LLama4                         | ❌            | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972)      |
+| LLama3.2                       | ❌            | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972)      |
+| Keye-VL-8B-Preview             | ❌            | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963)      |
+| Florence-2                     | ❌            | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259)      |
+| GLM-4V                         | ❌            | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260)      |
+| InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) |
+| Whisper                        | ❌            | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262)      |
+| Ultravox                       | 🟡 Need test  |                                                                      |