v0.10.1rc1
This commit is contained in:
10
docs/source/user_guide/support_matrix/index.md
Normal file
10
docs/source/user_guide/support_matrix/index.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# Features and models
|
||||
|
||||
This section provides a detailed supported matrix by vLLM Ascend.
|
||||
|
||||
:::{toctree}
|
||||
:caption: Support Matrix
|
||||
:maxdepth: 1
|
||||
supported_models
|
||||
supported_features
|
||||
:::
|
||||
45
docs/source/user_guide/support_matrix/supported_features.md
Normal file
45
docs/source/user_guide/support_matrix/supported_features.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Feature Support
|
||||
|
||||
The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
|
||||
|
||||
You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:
|
||||
|
||||
| Feature | Status | Next Step |
|
||||
|-------------------------------|----------------|------------------------------------------------------------------------|
|
||||
| Chunked Prefill | 🟢 Functional | Functional, see detail note: [Chunked Prefill][cp] |
|
||||
| Automatic Prefix Caching | 🟢 Functional | Functional, see detail note: [vllm-ascend#732][apc] |
|
||||
| LoRA | 🟢 Functional | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] |
|
||||
| Speculative decoding | 🟢 Functional | Basic support |
|
||||
| Pooling | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
|
||||
| Enc-dec | 🟡 Planned | vLLM should support this feature first. |
|
||||
| Multi Modality | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models |
|
||||
| LogProbs | 🟢 Functional | CI needed |
|
||||
| Prompt logProbs | 🟢 Functional | CI needed |
|
||||
| Async output | 🟢 Functional | CI needed |
|
||||
| Beam search | 🟢 Functional | CI needed |
|
||||
| Guided Decoding | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
|
||||
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode |
|
||||
| Pipeline Parallel | 🟢 Functional | Write official guide and tutorial. |
|
||||
| Expert Parallel | 🟢 Functional | Dynamic EPLB support. |
|
||||
| Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. |
|
||||
| Prefill Decode Disaggregation | 🟢 Functional | Functional, xPyD is supported. |
|
||||
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support(W4A8, etc) |
|
||||
| Graph Mode | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode] |
|
||||
| Sleep Mode | 🟢 Functional | |
|
||||
|
||||
- 🟢 Functional: Fully operational, with ongoing optimizations.
|
||||
- 🔵 Experimental: Experimental support, interfaces and functions may change.
|
||||
- 🚧 WIP: Under active development, will be supported soon.
|
||||
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
|
||||
- 🔴 NO plan / Deprecated: No plan or deprecated by vLLM.
|
||||
|
||||
[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
|
||||
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
|
||||
[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
|
||||
[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
|
||||
[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
|
||||
[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767
|
||||
[apc]: https://github.com/vllm-project/vllm-ascend/issues/732
|
||||
[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
|
||||
[1P1D]: https://github.com/vllm-project/vllm-ascend/pull/950
|
||||
[ray]: https://github.com/vllm-project/vllm-ascend/issues/1751
|
||||
79
docs/source/user_guide/support_matrix/supported_models.md
Normal file
79
docs/source/user_guide/support_matrix/supported_models.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Model Support
|
||||
|
||||
Get the newest info here: https://github.com/vllm-project/vllm-ascend/issues/1608
|
||||
|
||||
## Text-only Language Models
|
||||
|
||||
### Generative Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
|-------------------------------|-----------|----------------------------------------------------------------------|
|
||||
| DeepSeek v3 | ✅ | |
|
||||
| DeepSeek R1 | ✅ | |
|
||||
| DeepSeek Distill (Qwen/LLama) | ✅ | |
|
||||
| Qwen3 | ✅ | |
|
||||
| Qwen3-based | ✅ | |
|
||||
| Qwen3-Coder | ✅ | |
|
||||
| Qwen3-Moe | ✅ | |
|
||||
| Qwen2.5 | ✅ | |
|
||||
| Qwen2 | ✅ | |
|
||||
| Qwen2-based | ✅ | |
|
||||
| QwQ-32B | ✅ | |
|
||||
| LLama2/3/3.1 | ✅ | |
|
||||
| Internlm | ✅ | [#1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |
|
||||
| Baichuan | ✅ | |
|
||||
| Baichuan2 | ✅ | |
|
||||
| Phi-4-mini | ✅ | |
|
||||
| MiniCPM | ✅ | |
|
||||
| MiniCPM3 | ✅ | |
|
||||
| Ernie4.5 | ✅ | |
|
||||
| Ernie4.5-Moe | ✅ | |
|
||||
| Gemma-2 | ✅ | |
|
||||
| Gemma-3 | ✅ | |
|
||||
| Phi-3/4 | ✅ | |
|
||||
| Mistral/Mistral-Instruct | ✅ | |
|
||||
| GLM-4.5 | ✅ | |
|
||||
| GLM-4 | ❌ | [#2255](https://github.com/vllm-project/vllm-ascend/issues/2255) |
|
||||
| GLM-4-0414 | ❌ | [#2258](https://github.com/vllm-project/vllm-ascend/issues/2258) |
|
||||
| ChatGLM | ❌ | [#554](https://github.com/vllm-project/vllm-ascend/issues/554) |
|
||||
| DeepSeek v2.5 | 🟡 | Need test |
|
||||
| Mllama | 🟡 | Need test |
|
||||
| MiniMax-Text | 🟡 | Need test |
|
||||
|
||||
### Pooling Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
|-------------------------------|-----------|----------------------------------------------------------------------|
|
||||
| Qwen3-Embedding | ✅ | |
|
||||
| Molmo | ✅ | [1942](https://github.com/vllm-project/vllm-ascend/issues/1942) |
|
||||
| XLM-RoBERTa-based | ❌ | [1960](https://github.com/vllm-project/vllm-ascend/issues/1960) |
|
||||
|
||||
## Multimodal Language Models
|
||||
|
||||
### Generative Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
|--------------------------------|---------------|----------------------------------------------------------------------|
|
||||
| Qwen2-VL | ✅ | |
|
||||
| Qwen2.5-VL | ✅ | |
|
||||
| Qwen2.5-Omni | ✅ | [1760](https://github.com/vllm-project/vllm-ascend/issues/1760) |
|
||||
| QVQ | ✅ | |
|
||||
| LLaVA 1.5/1.6 | ✅ | [1962](https://github.com/vllm-project/vllm-ascend/issues/1962) |
|
||||
| InternVL2 | ✅ | |
|
||||
| InternVL2.5 | ✅ | |
|
||||
| Qwen2-Audio | ✅ | |
|
||||
| Aria | ✅ | |
|
||||
| LLaVA-Next | ✅ | |
|
||||
| LLaVA-Next-Video | ✅ | |
|
||||
| MiniCPM-V | ✅ | |
|
||||
| Mistral3 | ✅ | |
|
||||
| Phi-3-Vison/Phi-3.5-Vison | ✅ | |
|
||||
| Gemma3 | ✅ | |
|
||||
| LLama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |
|
||||
| LLama3.2 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |
|
||||
| Keye-VL-8B-Preview | ❌ | [1963](https://github.com/vllm-project/vllm-ascend/issues/1963) |
|
||||
| Florence-2 | ❌ | [2259](https://github.com/vllm-project/vllm-ascend/issues/2259) |
|
||||
| GLM-4V | ❌ | [2260](https://github.com/vllm-project/vllm-ascend/issues/2260) |
|
||||
| InternVL2.0/2.5/3.0<br>InternVideo2.5/Mono-InternVL | ❌ | [2064](https://github.com/vllm-project/vllm-ascend/issues/2064) |
|
||||
| Whisper | ❌ | [2262](https://github.com/vllm-project/vllm-ascend/issues/2262) |
|
||||
| Ultravox | 🟡 Need test | |
|
||||
Reference in New Issue
Block a user