[ReleaseNote] Release note of v0.10.0rc1 (#2225)

### What this PR does / why we need it? Release note of v0.10.0rc1 - vLLM version: v0.10.0 - vLLM main: 8e8e0b6af1 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-07 14:46:49 +08:00
parent 58c8d4fdcd
commit 4604882a3e
10 changed files with 100 additions and 21 deletions
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l

 ## Prerequisites

- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series
+- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series, Atlas 800I A3 Inference series, Atlas A3 Training series, Atlas 300I Duo (Experimental)
 - OS: Linux
 - Software:
  * Python >= 3.9, < 3.12
@@ -51,7 +51,7 @@ Please use the following recommended versions to get started quickly:

 | Version    | Release type | Doc                                  |
 |------------|--------------|--------------------------------------|
-|v0.9.2rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
+|v0.10.0rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
 |v0.9.1rc2|Next stable release|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
 |v0.7.3.post1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/stable/installation.html) for more details|

@@ -73,7 +73,7 @@ Below is maintained branches:

 | Branch     | Status       | Note                                 |
 |------------|--------------|--------------------------------------|
-| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.9.x branch   |
+| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.10.x branch   |
 | v0.7.1-dev | Unmaintained | Only doc fixed is allowed |
 | v0.7.3-dev | Maintained   | CI commitment for vLLM 0.7.3 version, only bug fix is allowed and no new release tag any more. |
 | v0.9.1-dev | Maintained   | CI commitment for vLLM 0.9.1 version |
--- a/README.zh.md
+++ b/README.zh.md
@@ -37,7 +37,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP

 ## 准备

- 硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列
+- 硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列、Atlas 800I A3 Inference系列、Atlas A3 Training系列、Atlas 300I Duo（实验性支持）
 - 操作系统：Linux
 - 软件：
  * Python >= 3.9, < 3.12
@@ -51,7 +51,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP

 | Version    | Release type | Doc                                  |
 |------------|--------------|--------------------------------------|
-|v0.9.2rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
+|v0.10.0rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
 |v0.9.1rc2| 下一个正式/稳定版 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
 |v0.7.3.post1| 最新正式/稳定版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/stable/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/stable/installation.html)了解更多|

--- a/docs/source/community/contributors.md
+++ b/docs/source/community/contributors.md
@@ -17,6 +17,28 @@ Updated on 2025-06-10:

 | Number | Contributor | Date | Commit ID |
 |:------:|:-----------:|:-----:|:---------:|
+| 105 | [@SlightwindSec](https://github.com/SlightwindSec) | 2025/8/5 | [f3b50c5](https://github.com/vllm-project/vllm-ascend/commit/f3b50c54e8243ad8ccefb9b033277fbdd382a9c4) |
+| 104 | [@CaveNightingale](https://github.com/CaveNightingale) | 2025/8/4 | [957c7f1](https://github.com/vllm-project/vllm-ascend/commit/957c7f108d5f0aea230220ccdc18d657229e4030) |
+| 103 | [@underfituu](https://github.com/underfituu) | 2025/8/4 | [e38fab0](https://github.com/vllm-project/vllm-ascend/commit/e38fab011d0b81f3a8e40d9bbe263c283dd4129b) |
+| 102 | [@yangqinghao-cmss](https://github.com/yangqinghao-cmss) | 2025/8/1 | [99fa0ac](https://github.com/vllm-project/vllm-ascend/commit/99fa0ac882c79ae9282940125b042a44ea422757) |
+| 101 | [@pjgao](https://github.com/pjgao) | 2025/7/31 | [6192bc9](https://github.com/vllm-project/vllm-ascend/commit/6192bc95c0e47097836e9be1f30f2a0a6fdca088) |
+| 100 | [@Liccol](https://github.com/Liccol) | 2025/7/31 | [7c90ba5](https://github.com/vllm-project/vllm-ascend/commit/7c90ba5fe8e420b891fdd30df050a33e3767835d) |
+| 99 | [@1024daniel](https://github.com/1024daniel) | 2025/7/31 | [db310c6](https://github.com/vllm-project/vllm-ascend/commit/db310c6ec97b056296f7c2348b90c1d96d0b562a) |
+| 98 | [@zhoux77899](https://github.com/zhoux77899) | 2025/7/30 | [4fcca13](https://github.com/vllm-project/vllm-ascend/commit/4fcca137a70c11daa4070ae014288be154715939) |
+| 97 | [@YuanCheng-coder](https://github.com/YuanCheng-coder) | 2025/7/30 | [34dd24a](https://github.com/vllm-project/vllm-ascend/commit/34dd24adf21fb85a2c413292754b1599832efae2) |
+| 96 | [@hongfugui](https://github.com/hongfugui) | 2025/7/30 | [1dbb888](https://github.com/vllm-project/vllm-ascend/commit/1dbb8882759e4326f5706f6e610674423376c2f3) |
+| 95 | [@Irving11-BKN](https://github.com/Irving11-BKN) | 2025/7/29 | [ca8007f](https://github.com/vllm-project/vllm-ascend/commit/ca8007f584141d3a59b2bcbd4f8ba269c9b7e252) |
+| 94 | [@taoxudonghaha](https://github.com/taoxudonghaha) | 2025/7/29 | [540336e](https://github.com/vllm-project/vllm-ascend/commit/540336edc9db09072a9aaa486fbf7ce625da5b9e) |
+| 93 | [@loukong33](https://github.com/loukong33) | 2025/7/28 | [1a25b0a](https://github.com/vllm-project/vllm-ascend/commit/1a25b0a2ddb23bf4d731ebac4503efaf237b191f) |
+| 92 | [@Ronald1995](https://github.com/Ronald1995) | 2025/7/25 | [e561a2c](https://github.com/vllm-project/vllm-ascend/commit/e561a2c6ec4493b490b13a4a9007d8f451ae0d0f) |
+| 91 | [@ZrBac](https://github.com/ZrBac) | 2025/7/24 | [2ffe051](https://github.com/vllm-project/vllm-ascend/commit/2ffe051859d585df8353d1b9eefb64c44078175a) |
+| 90 | [@SunnyLee151064](https://github.com/SunnyLee151064) | 2025/7/24 | [34571ea](https://github.com/vllm-project/vllm-ascend/commit/34571ea5ae69529758edf75f0252f86ccb4c7184) |
+| 89 | [@shiyuan680](https://github.com/shiyuan680) | 2025/7/23 | [ac0bf13](https://github.com/vllm-project/vllm-ascend/commit/ac0bf133f47ead20f18bf71f9be6dbe05fbd218f) |
+| 88 | [@aidoczh](https://github.com/aidoczh) | 2025/7/21 | [c32eea9](https://github.com/vllm-project/vllm-ascend/commit/c32eea96b73d26268070f57ef98416decc98aff7) |
+| 87 | [@nuclearwu](https://github.com/nuclearwu) | 2025/7/20 | [54f2b31](https://github.com/vllm-project/vllm-ascend/commit/54f2b311848badc86371d269140e729012a60f2c) |
+| 86 | [@pkking](https://github.com/pkking) | 2025/7/18 | [3e39d72](https://github.com/vllm-project/vllm-ascend/commit/3e39d7234c0e5c66b184c136c602e87272b5a36e) |
+| 85 | [@lianyiibo](https://github.com/lianyiibo) | 2025/7/18 | [53d2ea3](https://github.com/vllm-project/vllm-ascend/commit/53d2ea3789ffce32bf3ceb055d5582d28eadc6c7) |
+| 84 | [@xudongLi-cmss](https://github.com/xudongLi-cmss) | 2025/7/2 | [7fc1a98](https://github.com/vllm-project/vllm-ascend/commit/7fc1a984890bd930f670deedcb2dda3a46f84576) |
 | 83 | [@ZhengWG](https://github.com/) | 2025/7/7 | [3a469de](https://github.com/vllm-project/vllm-ascend/commit/9c886d0a1f0fc011692090b0395d734c83a469de) |
 | 82 | [@wm901115nwpu](https://github.com/) | 2025/7/7 | [a2a47d4](https://github.com/vllm-project/vllm-ascend/commit/f08c4f15a27f0f27132f4ca7a0c226bf0a2a47d4) |
 | 81 | [@Agonixiaoxiao](https://github.com/) | 2025/7/2 | [6f84576](https://github.com/vllm-project/vllm-ascend/commit/7fc1a984890bd930f670deedcb2dda3a46f84576) |
--- a/docs/source/community/versioning_policy.md
+++ b/docs/source/community/versioning_policy.md
@@ -22,6 +22,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:

 | vLLM Ascend | vLLM         | Python           | Stable CANN | PyTorch/torch_npu  | MindIE Turbo |
 |-------------|--------------|------------------|-------------|--------------------|--------------|
+| v0.10.0rc1  | v0.10.0      | >= 3.9, < 3.12   | 8.2.RC1     | 2.7.1 / 2.7.1.dev20250724            |              |
 | v0.9.2rc1   | v0.9.2       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1.dev20250619      |              |
 | v0.9.1rc2   | v0.9.1       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1|              |
 | v0.9.1rc1   | v0.9.1       | >= 3.9, < 3.12   | 8.1.RC1     | 2.5.1 / 2.5.1.post1.dev20250528      |              |
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -65,15 +65,15 @@ myst_substitutions = {
    # the branch of vllm, used in vllm clone
    # - main branch: 'main'
    # - vX.Y.Z branch: 'vX.Y.Z'
-    'vllm_version': 'v0.9.2',
+    'vllm_version': 'v0.10.0',
    # the branch of vllm-ascend, used in vllm-ascend clone and image tag
    # - main branch: 'main'
    # - vX.Y.Z branch: latest vllm-ascend release tag
-    'vllm_ascend_version': 'v0.9.2rc1',
+    'vllm_ascend_version': 'v0.10.0rc1',
    # the newest release version of vllm-ascend and matched vLLM, used in pip install.
    # This value should be updated when cut down release.
-    'pip_vllm_ascend_version': "0.9.2rc1",
-    'pip_vllm_version': "0.9.2",
+    'pip_vllm_ascend_version': "0.10.0rc1",
+    'pip_vllm_version': "0.10.0",
    # CANN image tag
    'cann_image_tag': "8.2.rc1-910b-ubuntu22.04-py3.11",
    # vllm version in ci
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -4,17 +4,19 @@

 - [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007)
 - [[v0.9.1rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1487)
- [[v0.9.2rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1742)
+- [[v0.10.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2217)

 ## General FAQs

 ### 1. What devices are currently supported?

-Currently, **ONLY** Atlas A2 series(Ascend-cann-kernels-910b) and Atlas 300I(Ascend-cann-kernels-310p) series are supported:
+Currently, **ONLY** Atlas A2 series(Ascend-cann-kernels-910b)，Atlas A2 series(Atlas-A3-cann-kernels) and Atlas 300I(Ascend-cann-kernels-310p) series are supported:

 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)
- Atlas 300I Inference series (Atlas 300I Duo)
+- Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
+- Atlas 800I A3 Inference series (Atlas 800I A3)
+- [Experimental] Atlas 300I Inference series (Atlas 300I Duo)

 Below series are NOT supported yet:
 - Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet
--- a/docs/source/quick_start.md
+++ b/docs/source/quick_start.md
@@ -5,6 +5,9 @@
 ### Supported Devices
 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)
+- Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
+- Atlas 800I A3 Inference series (Atlas 800I A3)
+- [Experimental] Atlas 300I Inference series (Atlas 300I Duo)

 ## Setup environment using container

--- a/docs/source/tutorials/multi_node.md
+++ b/docs/source/tutorials/multi_node.md
@@ -6,6 +6,7 @@ vLLM-Ascend now supports Data Parallel (DP) deployment, enabling model weights t
 Each DP rank is deployed as a separate “core engine” process which communicates with front-end process(es) via ZMQ sockets. Data Parallel can be combined with Tensor Parallel, in which case each DP engine owns a number of per-NPU worker processes equal to the TP size.

 For Mixture-of-Experts (MoE) models — especially advanced architectures like DeepSeek that utilize Multi-head Latent Attention (MLA) — a hybrid parallelism approach is recommended:
+
 - Use **Data Parallelism (DP)** for attention layers, which are replicated across devices and handle separate batches.
 - Use **Expert or Tensor Parallelism (EP/TP)** for expert layers, which are sharded across devices to distribute the computation.

--- a/docs/source/user_guide/release_notes.md
+++ b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,61 @@
 # Release note

+## v0.10.0rc1 - 2025.08.07
+
+This is the 1st release candidate of v0.10.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started. V0 is completely removed from this version.
+
+### Highlights
+* Disaggregate prefill works with V1 engine now. You can take a try with DeepSeek model [#950](https://github.com/vllm-project/vllm-ascend/pull/950), following this [tutorial](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/README.md).
+* W4A8 quantization method is supported for dense and MoE model now. [#2060](https://github.com/vllm-project/vllm-ascend/pull/2060) [#2172](https://github.com/vllm-project/vllm-ascend/pull/2172)
+
+### Core
+* Ascend PyTorch adapter (torch_npu) has been upgraded to `2.7.1.dev20250724`. [#1562](https://github.com/vllm-project/vllm-ascend/pull/1562) And CANN hase been upgraded to `8.2.RC1`. [#1653](https://github.com/vllm-project/vllm-ascend/pull/1653) Don’t forget to update them in your environment or using the latest images.
+* vLLM Ascend works on Atlas 800I A3 now, and the image on A3 will be released from this version on. [#1582](https://github.com/vllm-project/vllm-ascend/pull/1582)
+* Kimi-K2 with w8a8 quantization, Qwen3-Coder and GLM-4.5 is supported in vLLM Ascend, please following this [tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_kimi.md.html) to have a try. [#2162](https://github.com/vllm-project/vllm-ascend/pull/2162)
+* Pipeline Parallelism is supported in V1 now. [#1800](https://github.com/vllm-project/vllm-ascend/pull/1800)
+* Prefix cache feature now work with the Ascend Scheduler. [#1446](https://github.com/vllm-project/vllm-ascend/pull/1446)
+* Torchair graph mode works with tp > 4 now. [#1508](https://github.com/vllm-project/vllm-ascend/issues/1508)
+* MTP support torchair graph mode now [#2145](https://github.com/vllm-project/vllm-ascend/pull/2145)
+
+## Other
+
+* Bug fixes:
+    * Fix functional problem of multi-modality models like Qwen2-audio with Aclgraph. [#1803](https://github.com/vllm-project/vllm-ascend/pull/1803)
+    * Fix the process group creating error with external launch scenario. [#1681](https://github.com/vllm-project/vllm-ascend/pull/1681)
+    * Fix the functional problem with guided decoding. [#2022](https://github.com/vllm-project/vllm-ascend/pull/2022)
+    * Fix the accuracy issue with common MoE models in DP scenario. [#1856](https://github.com/vllm-project/vllm-ascend/pull/1856)
+* Performance improved through a lot of prs:
+    * Caching sin/cos instead of calculate it every layer. [#1890](https://github.com/vllm-project/vllm-ascend/pull/1890)
+    * Improve shared expert multi-stream parallelism [#1891](https://github.com/vllm-project/vllm-ascend/pull/1891)
+    * Implement the fusion of allreduce and matmul in prefill phase when tp is enabled. Enable this feature by setting `VLLM_ASCEND_ENABLE_MATMUL_ALLREDUCE` to `1`. [#1926](https://github.com/vllm-project/vllm-ascend/pull/1926)
+    * Optimize Quantized MoE Performance by Reducing All2All Communication. [#2195](https://github.com/vllm-project/vllm-ascend/pull/2195)
+    * Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance [#1806](https://github.com/vllm-project/vllm-ascend/pull/1806)
+    * Use multicast to avoid padding decode request to prefill size [#1555](https://github.com/vllm-project/vllm-ascend/pull/1555)
+    * The performance of LoRA has been improved. [#1884](https://github.com/vllm-project/vllm-ascend/pull/1884)
+* A batch of refactoring prs to enhance the code architecture:
+    * Torchair model runner refactor [#2205](https://github.com/vllm-project/vllm-ascend/pull/2205)
+    * Refactoring forward_context and model_runner_v1. [#1979](https://github.com/vllm-project/vllm-ascend/pull/1979)
+    * Refactor AscendMetaData Comments. [#1967](https://github.com/vllm-project/vllm-ascend/pull/1967)
+    * Refactor torchair utils. [#1892](https://github.com/vllm-project/vllm-ascend/pull/1892)
+    * Refactor torchair worker. [#1885](https://github.com/vllm-project/vllm-ascend/pull/1885)
+    * Register activation customop instead of overwrite forward_oot. [#1841](https://github.com/vllm-project/vllm-ascend/pull/1841)
+* Parameters changes:
+    * `expert_tensor_parallel_size` in `additional_config` is removed now, and the EP and TP is aligned with vLLM now. [#1681](https://github.com/vllm-project/vllm-ascend/pull/1681)
+    * Add `VLLM_ASCEND_MLA_PA` in environ variables, use this to enable mla paged attention operator for deepseek mla decode.
+    * Add `VLLM_ASCEND_ENABLE_MATMUL_ALLREDUCE` in environ variables, enable `MatmulAllReduce` fusion kernel when tensor parallel is enabled. This feature is supported in A2, and eager mode will get better performance.
+    * Add `VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ` in environ variables, Whether to enable moe all2all seq, this provides a basic framework on the basis of alltoall for easy expansion.
+
+* UT coverage reached 76.34% after a batch of prs followed by this rfc: [#1298](https://github.com/vllm-project/vllm-ascend/issues/1298)
+* Sequence Parallelism works for Qwen3 MoE. [#2209](https://github.com/vllm-project/vllm-ascend/issues/2209)
+* Chinese online document is added now. [#1870](https://github.com/vllm-project/vllm-ascend/issues/1870)
+
+### Known Issues
+* Aclgraph could not work with DP + EP currently, the mainly gap is the number of npu stream that Aclgraph needed to capture graph is not enough. [#2229](https://github.com/vllm-project/vllm-ascend/issues/2229)
+* There is an accuracy issue on W8A8 dynamic quantized DeepSeek with multistream enabled. This will be fixed in the next release. [#2232](https://github.com/vllm-project/vllm-ascend/issues/2232)
+* In Qwen3 MoE, SP cannot be incorporated into the Aclgraph. [#2246](https://github.com/vllm-project/vllm-ascend/issues/2246)
+* MTP not support V1 scheduler currently, will fix it in Q3. [#2254](https://github.com/vllm-project/vllm-ascend/issues/2254)
+* When running MTP with DP > 1, we need to disable metrics logger due to some issue on vLLM. [#2254](https://github.com/vllm-project/vllm-ascend/issues/2254)
+
 ## v0.9.1rc2 - 2025.08.04
 This is the 2nd release candidate of v0.9.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/) to get started.

--- a/docs/source/user_guide/support_matrix/supported_features.md
+++ b/docs/source/user_guide/support_matrix/supported_features.md
@@ -9,7 +9,6 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
 | Chunked Prefill               | 🟢 Functional  | Functional, see detail note: [Chunked Prefill][cp]                     |
 | Automatic Prefix Caching      | 🟢 Functional  | Functional, see detail note: [vllm-ascend#732][apc]                    |
 | LoRA                          | 🟢 Functional  | [vllm-ascend#396][multilora], [vllm-ascend#893][v1 multilora]          |
-| Prompt adapter                | 🔴 No plan     | This feature has been deprecated by vLLM.                              |
 | Speculative decoding          | 🟢 Functional  | Basic support                                                          |
 | Pooling                       | 🟢 Functional  | CI needed and adapting more models; V1 support rely on vLLM support.   |
 | Enc-dec                       | 🟡 Planned     | vLLM should support this feature first.                                |
@@ -17,15 +16,13 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
 | LogProbs                      | 🟢 Functional  | CI needed                                                              |
 | Prompt logProbs               | 🟢 Functional  | CI needed                                                              |
 | Async output                  | 🟢 Functional  | CI needed                                                              |
-| Multi step scheduler          | 🔴 Deprecated  | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler]     |
-| Best of                       | 🔴 Deprecated  | [vllm#13361][best_of]                                                  |
 | Beam search                   | 🟢 Functional  | CI needed                                                              |
 | Guided Decoding               | 🟢 Functional  | [vllm-ascend#177][guided_decoding]                                     |
 | Tensor Parallel               | 🟢 Functional  | Make TP >4 work with graph mode                                        |
 | Pipeline Parallel             | 🟢 Functional  | Write official guide and tutorial.                                     |
 | Expert Parallel               | 🟢 Functional  | Dynamic EPLB support.                                                  |
 | Data Parallel                 | 🟢 Functional  | Data Parallel support for Qwen3 MoE.                                   |
-| Prefill Decode Disaggregation | 🚧 WIP         | working on [1P1D] and xPyD.                                            |
+| Prefill Decode Disaggregation | 🟢 Functional  | Functional, xPyD is supported.                                         |
 | Quantization                  | 🟢 Functional  | W8A8 available; working on more quantization method support(W4A8, etc) |
 | Graph Mode                    | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode]           |
 | Sleep Mode                    | 🟢 Functional  |                                                                        |
@@ -38,10 +35,7 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th

 [v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
 [multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
-[best_of]: https://github.com/vllm-project/vllm/issues/13361
 [guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
-[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py
-[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779
 [multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
 [v1 multilora]: https://github.com/vllm-project/vllm-ascend/pull/893
 [graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767