v0.10.1rc1

This commit is contained in:
2025-09-09 09:40:35 +08:00
parent d6f6ef41fe
commit 9149384e03
432 changed files with 84698 additions and 1 deletions

View File

@@ -0,0 +1,10 @@
# Features and models
This section provides a detailed supported matrix by vLLM Ascend.
:::{toctree}
:caption: Support Matrix
:maxdepth: 1
supported_models
supported_features
:::

View File

@@ -0,0 +1,45 @@
# Feature Support
The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:
| Feature | Status | Next Step |
|-------------------------------|----------------|------------------------------------------------------------------------|
| Chunked Prefill | 🟢 Functional | Functional, see detail note: [Chunked Prefill][cp] |
| Automatic Prefix Caching | 🟢 Functional | Functional, see detail note: [vllm-ascend#732][apc] |
| LoRA | 🟢 Functional | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] |
| Speculative decoding | 🟢 Functional | Basic support |
| Pooling | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
| Enc-dec | 🟡 Planned | vLLM should support this feature first. |
| Multi Modality | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models |
| LogProbs | 🟢 Functional | CI needed |
| Prompt logProbs | 🟢 Functional | CI needed |
| Async output | 🟢 Functional | CI needed |
| Beam search | 🟢 Functional | CI needed |
| Guided Decoding | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode |
| Pipeline Parallel | 🟢 Functional | Write official guide and tutorial. |
| Expert Parallel | 🟢 Functional | Dynamic EPLB support. |
| Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. |
| Prefill Decode Disaggregation | 🟢 Functional | Functional, xPyD is supported. |
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support(W4A8, etc) |
| Graph Mode | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode] |
| Sleep Mode | 🟢 Functional | |
- 🟢 Functional: Fully operational, with ongoing optimizations.
- 🔵 Experimental: Experimental support, interfaces and functions may change.
- 🚧 WIP: Under active development, will be supported soon.
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
- 🔴 NO plan / Deprecated: No plan or deprecated by vLLM.
[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767
[apc]: https://github.com/vllm-project/vllm-ascend/issues/732
[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
[1P1D]: https://github.com/vllm-project/vllm-ascend/pull/950
[ray]: https://github.com/vllm-project/vllm-ascend/issues/1751

View File

@@ -0,0 +1,79 @@
# Model Support
Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
## Text-only Language Models
### Generative Models
| Model | Supported | Note |
|-------------------------------|-----------|----------------------------------------------------------------------|
| DeepSeek v3 | ✅ | |
| DeepSeek R1 | ✅ | |
| DeepSeek Distill (Qwen/LLama) | ✅ | |
| Qwen3 | ✅ | |
| Qwen3-based | ✅ | |
| Qwen3-Coder | ✅ | |
| Qwen3-Moe | ✅ | |
| Qwen2.5 | ✅ | |
| Qwen2 | ✅ | |
| Qwen2-based | ✅ | |
| QwQ-32B | ✅ | |
| LLama2/3/3.1 | ✅ | |
| Internlm | ✅ | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |
| Baichuan | ✅ | |
| Baichuan2 | ✅ | |
| Phi-4-mini | ✅ | |
| MiniCPM | ✅ | |
| MiniCPM3 | ✅ | |
| Ernie4.5 | ✅ | |
| Ernie4.5-Moe | ✅ | |
| Gemma-2 | ✅ | |
| Gemma-3 | ✅ | |
| Phi-3/4 | ✅ | |
| Mistral/Mistral-Instruct | ✅ | |
| GLM-4.5 | ✅ | |
| GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |
| GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |
| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |
| DeepSeek v2.5 | 🟡 | Need test |
| Mllama | 🟡 | Need test |
| MiniMax-Text | 🟡 | Need test |
### Pooling Models
| Model | Supported | Note |
|-------------------------------|-----------|----------------------------------------------------------------------|
| Qwen3-Embedding | ✅ | |
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) |
| XLM-RoBERTa-based | ❌ | [1960](https://github.com/vllm-project/vllm-ascend/issues/1960) |
## Multimodal Language Models
### Generative Models
| Model | Supported | Note |
|--------------------------------|---------------|----------------------------------------------------------------------|
| Qwen2-VL | ✅ | |
| Qwen2.5-VL | ✅ | |
| Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) |
| QVQ | ✅ | |
| LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |
| InternVL2 | ✅ | |
| InternVL2.5 | ✅ | |
| Qwen2-Audio | ✅ | |
| Aria | ✅ | |
| LLaVA-Next | ✅ | |
| LLaVA-Next-Video | ✅ | |
| MiniCPM-V | ✅ | |
| Mistral3 | ✅ | |
| Phi-3-Vison/Phi-3.5-Vison | ✅ | |
| Gemma3 | ✅ | |
| LLama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |
| LLama3.2 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |
| Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) |
| Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) |
| GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) |
| InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) |
| Whisper | ❌ | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262) |
| Ultravox | 🟡 Need test | |