[v0.11.0][Doc] Update doc (#3852)
### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Features and models
|
||||
# Features and Models
|
||||
|
||||
This section provides a detailed supported matrix by vLLM Ascend.
|
||||
This section provides a detailed matrix supported by vLLM Ascend.
|
||||
|
||||
:::{toctree}
|
||||
:caption: Support Matrix
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Feature Support
|
||||
# Supported Features
|
||||
|
||||
The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
|
||||
|
||||
@@ -6,11 +6,11 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
|
||||
|
||||
| Feature | Status | Next Step |
|
||||
|-------------------------------|----------------|------------------------------------------------------------------------|
|
||||
| Chunked Prefill | 🟢 Functional | Functional, see detail note: [Chunked Prefill][cp] |
|
||||
| Automatic Prefix Caching | 🟢 Functional | Functional, see detail note: [vllm-ascend#732][apc] |
|
||||
| Chunked Prefill | 🟢 Functional | Functional, see detailed note: [Chunked Prefill][cp] |
|
||||
| Automatic Prefix Caching | 🟢 Functional | Functional, see detailed note: [vllm-ascend#732][apc] |
|
||||
| LoRA | 🟢 Functional | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] |
|
||||
| Speculative decoding | 🟢 Functional | Basic support |
|
||||
| Pooling | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
|
||||
| Pooling | 🟢 Functional | CI needed to adapt to more models; V1 support rely on vLLM support. |
|
||||
| Enc-dec | 🟡 Planned | vLLM should support this feature first. |
|
||||
| Multi Modality | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models |
|
||||
| LogProbs | 🟢 Functional | CI needed |
|
||||
@@ -18,20 +18,20 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
|
||||
| Async output | 🟢 Functional | CI needed |
|
||||
| Beam search | 🟢 Functional | CI needed |
|
||||
| Guided Decoding | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
|
||||
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode |
|
||||
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode. |
|
||||
| Pipeline Parallel | 🟢 Functional | Write official guide and tutorial. |
|
||||
| Expert Parallel | 🟢 Functional | Dynamic EPLB support. |
|
||||
| Expert Parallel | 🟢 Functional | Support dynamic EPLB. |
|
||||
| Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. |
|
||||
| Prefill Decode Disaggregation | 🟢 Functional | Functional, xPyD is supported. |
|
||||
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support(W4A8, etc) |
|
||||
| Graph Mode | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode] |
|
||||
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support (W4A8, etc) |
|
||||
| Graph Mode | 🔵 Experimental| Experimental, see detailed note: [vllm-ascend#767][graph_mode] |
|
||||
| Sleep Mode | 🟢 Functional | |
|
||||
|
||||
- 🟢 Functional: Fully operational, with ongoing optimizations.
|
||||
- 🔵 Experimental: Experimental support, interfaces and functions may change.
|
||||
- 🚧 WIP: Under active development, will be supported soon.
|
||||
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
|
||||
- 🔴 NO plan / Deprecated: No plan or deprecated by vLLM.
|
||||
- 🔴 NO plan/Deprecated: No plan or deprecated by vLLM.
|
||||
|
||||
[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
|
||||
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
|
||||
|
||||
@@ -1,20 +1,22 @@
|
||||
# Model Support
|
||||
# Supported Models
|
||||
|
||||
Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
|
||||
Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
|
||||
|
||||
## Text-only Language Models
|
||||
## Text-Only Language Models
|
||||
|
||||
### Generative Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
| Model | Support | Note |
|
||||
|-------------------------------|-----------|----------------------------------------------------------------------|
|
||||
| DeepSeek v3 | ✅ | |
|
||||
| DeepSeek V3/3.1 | ✅ | |
|
||||
| DeepSeek V3.2 EXP | ✅ | |
|
||||
| DeepSeek R1 | ✅ | |
|
||||
| DeepSeek Distill (Qwen/LLama) | ✅ | |
|
||||
| Qwen3 | ✅ | |
|
||||
| Qwen3-based | ✅ | |
|
||||
| Qwen3-Coder | ✅ | |
|
||||
| Qwen3-Moe | ✅ | |
|
||||
| Qwen3-Next | ✅ | |
|
||||
| Qwen2.5 | ✅ | |
|
||||
| Qwen2 | ✅ | |
|
||||
| Qwen2-based | ✅ | |
|
||||
@@ -32,17 +34,17 @@ Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/160
|
||||
| Gemma-3 | ✅ | |
|
||||
| Phi-3/4 | ✅ | |
|
||||
| Mistral/Mistral-Instruct | ✅ | |
|
||||
| GLM-4.5 | ✅ | |
|
||||
| GLM-4.5 | ✅ | |
|
||||
| GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |
|
||||
| GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |
|
||||
| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |
|
||||
| DeepSeek v2.5 | 🟡 | Need test |
|
||||
| DeepSeek V2.5 | 🟡 | Need test |
|
||||
| Mllama | 🟡 | Need test |
|
||||
| MiniMax-Text | 🟡 | Need test |
|
||||
|
||||
### Pooling Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
| Model | Support | Note |
|
||||
|-------------------------------|-----------|----------------------------------------------------------------------|
|
||||
| Qwen3-Embedding | ✅ | |
|
||||
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) |
|
||||
@@ -52,10 +54,12 @@ Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/160
|
||||
|
||||
### Generative Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
| Model | Support | Note |
|
||||
|--------------------------------|---------------|----------------------------------------------------------------------|
|
||||
| Qwen2-VL | ✅ | |
|
||||
| Qwen2.5-VL | ✅ | |
|
||||
| Qwen3-VL | ✅ | |
|
||||
| Qwen3-VL-MOE | ✅ | |
|
||||
| Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) |
|
||||
| QVQ | ✅ | |
|
||||
| LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |
|
||||
@@ -76,4 +80,4 @@ Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/160
|
||||
| GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) |
|
||||
| InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) |
|
||||
| Whisper | ❌ | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262) |
|
||||
| Ultravox | 🟡 Need test | |
|
||||
| Ultravox | 🟡 | Need test |
|
||||
Reference in New Issue
Block a user