[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it?
Update doc


Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
zhangxinyuehfad
2025-10-29 11:32:12 +08:00
committed by GitHub
parent 6188450269
commit 75de3fa172
49 changed files with 724 additions and 701 deletions

View File

@@ -1,6 +1,6 @@
# Features and models
# Features and Models
This section provides a detailed supported matrix by vLLM Ascend.
This section provides a detailed matrix supported by vLLM Ascend.
:::{toctree}
:caption: Support Matrix

View File

@@ -1,4 +1,4 @@
# Feature Support
# Supported Features
The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
@@ -6,11 +6,11 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
| Feature | Status | Next Step |
|-------------------------------|----------------|------------------------------------------------------------------------|
| Chunked Prefill | 🟢 Functional | Functional, see detail note: [Chunked Prefill][cp] |
| Automatic Prefix Caching | 🟢 Functional | Functional, see detail note: [vllm-ascend#732][apc] |
| Chunked Prefill | 🟢 Functional | Functional, see detailed note: [Chunked Prefill][cp] |
| Automatic Prefix Caching | 🟢 Functional | Functional, see detailed note: [vllm-ascend#732][apc] |
| LoRA | 🟢 Functional | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] |
| Speculative decoding | 🟢 Functional | Basic support |
| Pooling | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
| Pooling | 🟢 Functional | CI needed to adapt to more models; V1 support rely on vLLM support. |
| Enc-dec | 🟡 Planned | vLLM should support this feature first. |
| Multi Modality | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models |
| LogProbs | 🟢 Functional | CI needed |
@@ -18,20 +18,20 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
| Async output | 🟢 Functional | CI needed |
| Beam search | 🟢 Functional | CI needed |
| Guided Decoding | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode |
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode. |
| Pipeline Parallel | 🟢 Functional | Write official guide and tutorial. |
| Expert Parallel | 🟢 Functional | Dynamic EPLB support. |
| Expert Parallel | 🟢 Functional | Support dynamic EPLB. |
| Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. |
| Prefill Decode Disaggregation | 🟢 Functional | Functional, xPyD is supported. |
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support(W4A8, etc) |
| Graph Mode | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode] |
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support (W4A8, etc) |
| Graph Mode | 🔵 Experimental| Experimental, see detailed note: [vllm-ascend#767][graph_mode] |
| Sleep Mode | 🟢 Functional | |
- 🟢 Functional: Fully operational, with ongoing optimizations.
- 🔵 Experimental: Experimental support, interfaces and functions may change.
- 🚧 WIP: Under active development, will be supported soon.
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
- 🔴 NO plan / Deprecated: No plan or deprecated by vLLM.
- 🔴 NO plan/Deprecated: No plan or deprecated by vLLM.
[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html

View File

@@ -1,20 +1,22 @@
# Model Support
# Supported Models
Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
## Text-only Language Models
## Text-Only Language Models
### Generative Models
| Model | Supported | Note |
| Model | Support | Note |
|-------------------------------|-----------|----------------------------------------------------------------------|
| DeepSeek v3 | ✅ | |
| DeepSeek V3/3.1 | ✅ | |
| DeepSeek V3.2 EXP | ✅ | |
| DeepSeek R1 | ✅ | |
| DeepSeek Distill (Qwen/LLama) | ✅ | |
| Qwen3 | ✅ | |
| Qwen3-based | ✅ | |
| Qwen3-Coder | ✅ | |
| Qwen3-Moe | ✅ | |
| Qwen3-Next | ✅ | |
| Qwen2.5 | ✅ | |
| Qwen2 | ✅ | |
| Qwen2-based | ✅ | |
@@ -32,17 +34,17 @@ Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/160
| Gemma-3 | ✅ | |
| Phi-3/4 | ✅ | |
| Mistral/Mistral-Instruct | ✅ | |
| GLM-4.5 | ✅ | |
| GLM-4.5 | ✅ | |
| GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |
| GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |
| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |
| DeepSeek v2.5 | 🟡 | Need test |
| DeepSeek V2.5 | 🟡 | Need test |
| Mllama | 🟡 | Need test |
| MiniMax-Text | 🟡 | Need test |
### Pooling Models
| Model | Supported | Note |
| Model | Support | Note |
|-------------------------------|-----------|----------------------------------------------------------------------|
| Qwen3-Embedding | ✅ | |
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) |
@@ -52,10 +54,12 @@ Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/160
### Generative Models
| Model | Supported | Note |
| Model | Support | Note |
|--------------------------------|---------------|----------------------------------------------------------------------|
| Qwen2-VL | ✅ | |
| Qwen2.5-VL | ✅ | |
| Qwen3-VL | ✅ | |
| Qwen3-VL-MOE | ✅ | |
| Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) |
| QVQ | ✅ | |
| LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |
@@ -76,4 +80,4 @@ Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/160
| GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) |
| InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) |
| Whisper | ❌ | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262) |
| Ultravox | 🟡 Need test | |
| Ultravox | 🟡 | Need test |