[Doc] Update user doc index (#1581)

Add user doc index to make the user guide more clear - vLLM version: v0.9.1 - vLLM main: 49e8c7ea25 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-10 14:26:59 +08:00
parent c7446438a9
commit 3d1e6a5929
16 changed files with 42 additions and 28 deletions
--- a/docs/source/user_guide/support_matrix/index.md
+++ b/docs/source/user_guide/support_matrix/index.md
@@ -0,0 +1,10 @@
+# Features and models
+
+This section provides a detailed supported matrix by vLLM Ascend.
+
+:::{toctree}
+:caption: Support Matrix
+:maxdepth: 1
+supported_models
+supported_features
+:::
--- a/docs/source/user_guide/support_matrix/supported_features.md
+++ b/docs/source/user_guide/support_matrix/supported_features.md
@@ -0,0 +1,49 @@
+# Feature Support
+
+The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
+
+You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:
+
+| Feature                       | vLLM V0 Engine | vLLM V1 Engine | Next Step                                                              |
+|-------------------------------|----------------|----------------|------------------------------------------------------------------------|
+| Chunked Prefill               | 🟢 Functional  | 🟢 Functional  | Functional, see detail note: [Chunked Prefill][cp]                     |
+| Automatic Prefix Caching      | 🟢 Functional  | 🟢 Functional  | Functional, see detail note: [vllm-ascend#732][apc]                    |
+| LoRA                          | 🟢 Functional  | 🟢 Functional  | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora]          |
+| Prompt adapter                | 🔴 No plan     | 🔴 No plan     | This feature has been deprecated by vllm.                              |
+| Speculative decoding          | 🟢 Functional  | 🟢 Functional  | Basic support                                                          |
+| Pooling                       | 🟢 Functional  | 🟡 Planned     | CI needed and adapting more models; V1 support rely on vLLM support.   |
+| Enc-dec                       | 🔴 NO plan     | 🟡 Planned     | Plan in 2025.06.30                                                     |
+| Multi Modality                | 🟢 Functional  | 🟢 Functional  | [Tutorial][multimodal], optimizing and adapting more models            |
+| LogProbs                      | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
+| Prompt logProbs               | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
+| Async output                  | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
+| Multi step scheduler          | 🟢 Functional  | 🔴 Deprecated  | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler]     |
+| Best of                       | 🟢 Functional  | 🔴 Deprecated  | [vllm#13361][best_of], CI needed                                       |
+| Beam search                   | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
+| Guided Decoding               | 🟢 Functional  | 🟢 Functional  | [vllm-ascend#177][guided_decoding]                                     |
+| Tensor Parallel               | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
+| Pipeline Parallel             | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
+| Expert Parallel               | 🔴 NO plan     | 🟢 Functional  | CI needed; No plan on V0 support                                       |
+| Data Parallel                 | 🔴 NO plan     | 🟢 Functional  | CI needed;  No plan on V0 support                                      |
+| Prefill Decode Disaggregation | 🟢 Functional  | 🟢 Functional  | 1P1D available, working on xPyD and V1 support.                        |
+| Quantization                  | 🟢 Functional  | 🟢 Functional  | W8A8 available, CI needed; working on more quantization method support |
+| Graph Mode                    | 🔴 NO plan     | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode]           |
+| Sleep Mode                    | 🟢 Functional  | 🟢 Functional  | level=1 available, CI needed, working on V1 support                    |
+
+- 🟢 Functional: Fully operational, with ongoing optimizations.
+- 🔵 Experimental: Experimental support, interfaces and functions may change.
+- 🚧 WIP: Under active development, will be supported soon.
+- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
+- 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1.
+
+[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
+[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
+[best_of]: https://github.com/vllm-project/vllm/issues/13361
+[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
+[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py
+[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779
+[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
+[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
+[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767
+[apc]: https://github.com/vllm-project/vllm-ascend/issues/732
+[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
--- a/docs/source/user_guide/support_matrix/supported_models.md
+++ b/docs/source/user_guide/support_matrix/supported_models.md
@@ -0,0 +1,52 @@
+# Model Support
+
+## Text-only Language Models
+
+### Generative Models
+| Model | Supported | Note |
+|-------|-----------|------|
+| DeepSeek v3 | ✅ | |
+| DeepSeek R1 | ✅ | |
+| DeepSeek Distill (Qwen/LLama) | ✅ | |
+| Qwen3 | ✅ | |
+| Qwen3-Moe | ✅ | |
+| Qwen2.5 | ✅ | |
+| QwQ-32B | ✅ | |
+| LLama3.1/3.2 | ✅ | |
+| Internlm | ✅ | |
+| Baichuan | ✅ | |
+| Phi-4-mini | ✅ | |
+| MiniCPM | ✅ | |
+| MiniCPM3 | ✅ | |
+| LLama4 | ✅ | |
+| Mistral | | Need test |
+| DeepSeek v2.5 | |Need test |
+| Gemma-2 | | Need test |
+| Mllama |  |Need test|
+| Gemma-3 | ❌ | [#496](https://github.com/vllm-project/vllm-ascend/issues/496) |
+| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |
+
+### Pooling Models
+| Model | Supported | Note |
+|-------|---------|------|
+| XLM-RoBERTa-based | ✅ |  |
+| Molmo | ✅ |  |
+
+
+## Multimodal Language Models
+
+### Generative Models
+| Model | Supported | Note |
+|-------|-----------|------|
+| Qwen2-VL | ✅ | |
+| Qwen2.5-VL | ✅ | |
+| LLaVA 1.5 | ✅ | |
+| LLaVA 1.6 | ✅ | [#553](https://github.com/vllm-project/vllm-ascend/issues/553) |
+| InternVL2 | ✅ | |
+| InternVL2.5 | ✅ | |
+| Qwen2-Audio | ✅ |  |
+| LLaVA-Next |  | Need test |
+| LLaVA-Next-Video | | Need test |
+| Phi-3-Vison/Phi-3.5-Vison | | Need test |
+| GLM-4v | | Need test |
+| Ultravox |  | Need test |