[ReleaseNote] Release note of v0.10.0rc1 (#2225)

### What this PR does / why we need it?
Release note of v0.10.0rc1

- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-08-07 14:46:49 +08:00
committed by GitHub
parent 58c8d4fdcd
commit 4604882a3e
10 changed files with 100 additions and 21 deletions

View File

@@ -9,7 +9,6 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
| Chunked Prefill | 🟢 Functional | Functional, see detail note: [Chunked Prefill][cp] |
| Automatic Prefix Caching | 🟢 Functional | Functional, see detail note: [vllm-ascend#732][apc] |
| LoRA | 🟢 Functional | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora] |
| Prompt adapter | 🔴 No plan | This feature has been deprecated by vLLM. |
| Speculative decoding | 🟢 Functional | Basic support |
| Pooling | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
| Enc-dec | 🟡 Planned | vLLM should support this feature first. |
@@ -17,15 +16,13 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
| LogProbs | 🟢 Functional | CI needed |
| Prompt logProbs | 🟢 Functional | CI needed |
| Async output | 🟢 Functional | CI needed |
| Multi step scheduler | 🔴 Deprecated | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler] |
| Best of | 🔴 Deprecated | [vllm#13361][best_of] |
| Beam search | 🟢 Functional | CI needed |
| Guided Decoding | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
| Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode |
| Pipeline Parallel | 🟢 Functional | Write official guide and tutorial. |
| Expert Parallel | 🟢 Functional | Dynamic EPLB support. |
| Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. |
| Prefill Decode Disaggregation | 🚧 WIP | working on [1P1D] and xPyD. |
| Prefill Decode Disaggregation | 🟢 Functional | Functional, xPyD is supported. |
| Quantization | 🟢 Functional | W8A8 available; working on more quantization method support(W4A8, etc) |
| Graph Mode | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode] |
| Sleep Mode | 🟢 Functional | |
@@ -38,10 +35,7 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
[best_of]: https://github.com/vllm-project/vllm/issues/13361
[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py
[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779
[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
[v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767