xc-llm-ascend/docs/source/user_guide/suppoted_features.md

# Feature Support

The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.

You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:

| Feature                       | vLLM V0 Engine | vLLM V1 Engine | Next Step                                                              |
|-------------------------------|----------------|----------------|------------------------------------------------------------------------|
| Chunked Prefill               | 🟢 Functional  | 🟢 Functional  | Functional, see detail note: [Chunked Prefill][cp]                     |
| Automatic Prefix Caching      | 🟢 Functional  | 🟢 Functional  | Functional, see detail note: [vllm-ascend#732][apc]                    |
| LoRA                          | 🟢 Functional  | 🟢 Functional  | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora]          |
| Prompt adapter                | 🔴 No plan     | 🔴 No plan     | This feature has been deprecated by vllm.                              |
| Speculative decoding          | 🟢 Functional  | 🟢 Functional  | Basic support                                                          |
| Pooling                       | 🟢 Functional  | 🟡 Planned     | CI needed and adapting more models; V1 support rely on vLLM support.   |
| Enc-dec                       | 🔴 NO plan     | 🟡 Planned     | Plan in 2025.06.30                                                     |
| Multi Modality                | 🟢 Functional  | 🟢 Functional  | [Tutorial][multimodal], optimizing and adapting more models            |
| LogProbs                      | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
| Prompt logProbs               | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
| Async output                  | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
| Multi step scheduler          | 🟢 Functional  | 🔴 Deprecated  | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler]     |
| Best of                       | 🟢 Functional  | 🔴 Deprecated  | [vllm#13361][best_of], CI needed                                       |
| Beam search                   | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
| Guided Decoding               | 🟢 Functional  | 🟢 Functional  | [vllm-ascend#177][guided_decoding]                                     |
| Tensor Parallel               | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
| Pipeline Parallel             | 🟢 Functional  | 🟢 Functional  | CI needed                                                              |
| Expert Parallel               | 🔴 NO plan     | 🟢 Functional  | CI needed; No plan on V0 support                                       |
| Data Parallel                 | 🔴 NO plan     | 🟢 Functional  | CI needed;  No plan on V0 support                                      |
| Prefill Decode Disaggregation | 🟢 Functional  | 🟢 Functional  | 1P1D available, working on xPyD and V1 support.                        |
| Quantization                  | 🟢 Functional  | 🟢 Functional  | W8A8 available, CI needed; working on more quantization method support |
| Graph Mode                    | 🔴 NO plan     | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode]           |
| Sleep Mode                    | 🟢 Functional  | 🟢 Functional  | level=1 available, CI needed, working on V1 support                    |

- 🟢 Functional: Fully operational, with ongoing optimizations.
- 🔵 Experimental: Experimental support, interfaces and functions may change.
- 🚧 WIP: Under active development, will be supported soon.
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
- 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1.

[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
[best_of]: https://github.com/vllm-project/vllm/issues/13361
[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py
[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779
[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767
[apc]: https://github.com/vllm-project/vllm-ascend/issues/732
[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
[Docs] Add official doc index (#29) Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-02-11 12:00:27 +08:00			`# Feature Support`

[Doc] Update feature support list (#650) 1. remove Chinese doc. The content is out of data and we don't have enough time to maintain it. 2. Update feature support matrix. Refresh the content and add V1 status. --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-26 10:27:29 +08:00			`The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.`

			`You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:`

			`\| Feature \| vLLM V0 Engine \| vLLM V1 Engine \| Next Step \|`
			`\|-------------------------------\|----------------\|----------------\|------------------------------------------------------------------------\|`
[Doc] Update 0.9.0rc1 release date (#1139) 1. Update 0.9.0rc1 release date 2. Update feature and model support list 3. Add DP known issue to release note Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-06-09 22:51:02 +08:00			`\| Chunked Prefill \| 🟢 Functional \| 🟢 Functional \| Functional, see detail note: [Chunked Prefill][cp] \|`
			`\| Automatic Prefix Caching \| 🟢 Functional \| 🟢 Functional \| Functional, see detail note: [vllm-ascend#732][apc] \|`
[DOC] mark v1 multi-lora functional (#932) ### What this PR does / why we need it? Update feature support for lora ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? preview Signed-off-by: paulyu <paulyu0307@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com> 2025-05-22 19:53:14 +08:00			`\| LoRA \| 🟢 Functional \| 🟢 Functional \| [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] \|`
[Doc] Update 0.9.0rc1 release date (#1139) 1. Update 0.9.0rc1 release date 2. Update feature and model support list 3. Add DP known issue to release note Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-06-09 22:51:02 +08:00			`\| Prompt adapter \| 🔴 No plan \| 🔴 No plan \| This feature has been deprecated by vllm. \|`
			`\| Speculative decoding \| 🟢 Functional \| 🟢 Functional \| Basic support \|`
[Doc] Add 0.8.5rc1 release note (#756) ### What this PR does / why we need it? Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-05-06 23:46:35 +08:00			`\| Pooling \| 🟢 Functional \| 🟡 Planned \| CI needed and adapting more models; V1 support rely on vLLM support. \|`
[Doc] Update feature support list (#650) 1. remove Chinese doc. The content is out of data and we don't have enough time to maintain it. 2. Update feature support matrix. Refresh the content and add V1 status. --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-26 10:27:29 +08:00			`\| Enc-dec \| 🔴 NO plan \| 🟡 Planned \| Plan in 2025.06.30 \|`
			`\| Multi Modality \| 🟢 Functional \| 🟢 Functional \| [Tutorial][multimodal], optimizing and adapting more models \|`
			`\| LogProbs \| 🟢 Functional \| 🟢 Functional \| CI needed \|`
			`\| Prompt logProbs \| 🟢 Functional \| 🟢 Functional \| CI needed \|`
			`\| Async output \| 🟢 Functional \| 🟢 Functional \| CI needed \|`
[Doc] Add 0.8.5rc1 release note (#756) ### What this PR does / why we need it? Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-05-06 23:46:35 +08:00			`\| Multi step scheduler \| 🟢 Functional \| 🔴 Deprecated \| [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler] \|`
[Doc] Update feature support list (#650) 1. remove Chinese doc. The content is out of data and we don't have enough time to maintain it. 2. Update feature support matrix. Refresh the content and add V1 status. --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-26 10:27:29 +08:00			`\| Best of \| 🟢 Functional \| 🔴 Deprecated \| [vllm#13361][best_of], CI needed \|`
			`\| Beam search \| 🟢 Functional \| 🟢 Functional \| CI needed \|`
			`\| Guided Decoding \| 🟢 Functional \| 🟢 Functional \| [vllm-ascend#177][guided_decoding] \|`
			`\| Tensor Parallel \| 🟢 Functional \| 🟢 Functional \| CI needed \|`
			`\| Pipeline Parallel \| 🟢 Functional \| 🟢 Functional \| CI needed \|`
			`\| Expert Parallel \| 🔴 NO plan \| 🟢 Functional \| CI needed; No plan on V0 support \|`
			`\| Data Parallel \| 🔴 NO plan \| 🟢 Functional \| CI needed; No plan on V0 support \|`
			`\| Prefill Decode Disaggregation \| 🟢 Functional \| 🟢 Functional \| 1P1D available, working on xPyD and V1 support. \|`
			`\| Quantization \| 🟢 Functional \| 🟢 Functional \| W8A8 available, CI needed; working on more quantization method support \|`
[Doc] Add 0.8.5rc1 release note (#756) ### What this PR does / why we need it? Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-05-06 23:46:35 +08:00			`\| Graph Mode \| 🔴 NO plan \| 🔵 Experimental\| Experimental, see detail note: [vllm-ascend#767][graph_mode] \|`
[Doc] Update feature support list (#650) 1. remove Chinese doc. The content is out of data and we don't have enough time to maintain it. 2. Update feature support matrix. Refresh the content and add V1 status. --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-26 10:27:29 +08:00			`\| Sleep Mode \| 🟢 Functional \| 🟢 Functional \| level=1 available, CI needed, working on V1 support \|`

			`- 🟢 Functional: Fully operational, with ongoing optimizations.`
[Doc] Add 0.8.5rc1 release note (#756) ### What this PR does / why we need it? Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-05-06 23:46:35 +08:00			`- 🔵 Experimental: Experimental support, interfaces and functions may change.`
			`- 🚧 WIP: Under active development, will be supported soon.`
[Doc] Update feature support list (#650) 1. remove Chinese doc. The content is out of data and we don't have enough time to maintain it. 2. Update feature support matrix. Refresh the content and add V1 status. --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-26 10:27:29 +08:00			`- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).`
			`- 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1.`

			`[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html`
			`[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html`
			`[best_of]: https://github.com/vllm-project/vllm/issues/13361`
			`[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177`
			`[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py`
			`[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779`
			`[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396`
[DOC] mark v1 multi-lora functional (#932) ### What this PR does / why we need it? Update feature support for lora ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? preview Signed-off-by: paulyu <paulyu0307@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com> 2025-05-22 19:53:14 +08:00			`[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893`
[Doc] Add 0.8.5rc1 release note (#756) ### What this PR does / why we need it? Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-05-06 23:46:35 +08:00			`[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767`
			`[apc]: https://github.com/vllm-project/vllm-ascend/issues/732`
			`[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill`