From e20f0b1a0d2fdb1d86a15d55d70fe60a7a1b5a45 Mon Sep 17 00:00:00 2001 From: Mengqing Cao Date: Sun, 15 Mar 2026 22:47:47 +0800 Subject: [PATCH] [ReleaseNote] Add release note for v0.17.0rc1 (#7240) ### What this PR does / why we need it? This pull request adds the release notes for `v0.17.0rc1`. It also updates version numbers across various documentation files, including `README.md`, `README.zh.md`, `docs/source/community/versioning_policy.md`, and `docs/source/conf.py` to reflect the new release. - vLLM version: v0.17.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d --- .../schedule_image_build_and_push.yaml | 1 + .../schedule_update_estimated_time.yaml | 2 +- README.md | 4 +- README.zh.md | 4 +- docs/source/community/versioning_policy.md | 6 +- docs/source/conf.py | 8 +-- docs/source/faqs.md | 3 +- docs/source/user_guide/release_notes.md | 55 ++++++++++++++++++- 8 files changed, 69 insertions(+), 14 deletions(-) diff --git a/.github/workflows/schedule_image_build_and_push.yaml b/.github/workflows/schedule_image_build_and_push.yaml index ff00a4c7..99a8cf99 100644 --- a/.github/workflows/schedule_image_build_and_push.yaml +++ b/.github/workflows/schedule_image_build_and_push.yaml @@ -30,6 +30,7 @@ on: type: choice options: - main + - v0.17.0rc1 - v0.16.0rc1 - v0.15.0rc1 - v0.14.0rc1 diff --git a/.github/workflows/schedule_update_estimated_time.yaml b/.github/workflows/schedule_update_estimated_time.yaml index 904a1022..eb7b94f6 100644 --- a/.github/workflows/schedule_update_estimated_time.yaml +++ b/.github/workflows/schedule_update_estimated_time.yaml @@ -28,7 +28,7 @@ jobs: name: e2e-test strategy: matrix: - vllm_version: [v0.16.0] + vllm_version: [v0.17.0] type: [full, light] uses: ./.github/workflows/_e2e_test.yaml with: diff --git a/README.md b/README.md index 7c81773d..fce5d824 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Please use the following recommended versions to get started quickly: | Version | Release type | Doc | |------------|--------------|--------------------------------------| -| v0.16.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details | +| v0.17.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details | | v0.13.0 | Latest stable version | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html) for more details | ## Contributing @@ -86,7 +86,7 @@ Below are the maintained branches: | Branch | Status | Note | |------------|--------------|--------------------------------------| -| main | Maintained | CI commitment for vLLM main branch and vLLM v0.16.0 tag | +| main | Maintained | CI commitment for vLLM main branch and vLLM v0.17.0 tag | | v0.7.1-dev | Unmaintained | Only doc fixes are allowed | | v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fixes are allowed, and no new release tags anymore. | | v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version | diff --git a/README.zh.md b/README.zh.md index 99e31342..4ce3e127 100644 --- a/README.zh.md +++ b/README.zh.md @@ -57,7 +57,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP | Version | Release type | Doc | |------------|--------------|--------------------------------------| -|v0.16.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多| +|v0.17.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多| |v0.13.0| 最新正式/稳定版本 |[快速开始](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [安装指南](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html)了解更多| ## 贡献 @@ -80,7 +80,7 @@ vllm-ascend有主干分支和开发分支。 | 分支 | 状态 | 备注 | |------------|------------|---------------------| -| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.16.0)CI看护 | +| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.17.0)CI看护 | | v0.7.1-dev | Unmaintained | 只允许文档修复 | | v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 | | v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 | diff --git a/docs/source/community/versioning_policy.md b/docs/source/community/versioning_policy.md index ea52fe98..65bcad44 100644 --- a/docs/source/community/versioning_policy.md +++ b/docs/source/community/versioning_policy.md @@ -23,7 +23,8 @@ The table below is the release compatibility matrix for vLLM Ascend release. | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | Triton Ascend | |-------------|-------------------|-----------------|-------------|---------------------------------|---------------| -| v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 | +| v0.17.0rc1 | v0.17.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 | +| v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 | | v0.15.0rc1 | v0.15.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 | | v0.14.0rc1 | v0.14.1 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 | | v0.13.0 | v0.13.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.8.0.post2 | 3.2.0 | @@ -58,7 +59,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | |-------------|--------------|------------------|-------------|--------------------| -| main | 4034c3d32e30d01639459edd3ab486f56993876d, v0.16.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | +| main | 4034c3d32e30d01639459edd3ab486f56993876d, v0.17.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | ## Release cadence @@ -66,6 +67,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL | Date | Event | |------------|-------------------------------------------| +| 2026.03.15 | Release candidates, v0.17.0rc1 | | 2026.03.10 | Release candidates, v0.16.0rc1 | | 2026.02.27 | Release candidates, v0.15.0rc1 | | 2026.02.06 | v0.13.0 Final release, v0.13.0 | diff --git a/docs/source/conf.py b/docs/source/conf.py index 953c7210..a1661da4 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -65,15 +65,15 @@ myst_substitutions = { # the branch of vllm, used in vllm clone # - main branch: 'main' # - vX.Y.Z branch: 'vX.Y.Z' - "vllm_version": "v0.16.0", + "vllm_version": "v0.17.0", # the branch of vllm-ascend, used in vllm-ascend clone and image tag # - main branch: 'main' # - vX.Y.Z branch: latest vllm-ascend release tag - "vllm_ascend_version": "v0.16.0rc1", + "vllm_ascend_version": "v0.17.0rc1", # the newest release version of vllm-ascend and matched vLLM, used in pip install. # This value should be updated when cut down release. - "pip_vllm_ascend_version": "0.16.0rc1", - "pip_vllm_version": "0.16.0", + "pip_vllm_ascend_version": "0.17.0rc1", + "pip_vllm_version": "0.17.0", # CANN image tag "cann_image_tag": "8.5.1-910b-ubuntu22.04-py3.11", # vllm version in ci diff --git a/docs/source/faqs.md b/docs/source/faqs.md index 46ef7bc1..8900b514 100644 --- a/docs/source/faqs.md +++ b/docs/source/faqs.md @@ -2,8 +2,7 @@ ## Version Specific FAQs -- [[v0.16.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6969) -- [[v0.15.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6838) +- [[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7173) - [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583) ## General FAQs diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md index 8d6ed8e5..89b97078 100644 --- a/docs/source/user_guide/release_notes.md +++ b/docs/source/user_guide/release_notes.md @@ -1,5 +1,58 @@ # Release Notes +## v0.17.0rc1 - 2026.03.15 + +This is the first release candidate of v0.17.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started. + +### Highlights + +- Ascend950 chip is now supported. [#7151](https://github.com/vllm-project/vllm-ascend/pull/7151) +- ACLGraph (graph mode) is now supported for Model Runner V2. [#7110](https://github.com/vllm-project/vllm-ascend/pull/7110) +- Unified parallelized speculative decoding is supported, enabling parallel draft inference schemes simultaneously. [#6766](https://github.com/vllm-project/vllm-ascend/pull/6766) + +### Features + +- Auto-detect quantization format from model files, and remote model IDs (e.g., `org/model-name`) are also supported. `--quantization ascend` is not required now. [#7111](https://github.com/vllm-project/vllm-ascend/pull/7111) +- Qwen3.5 is supported from this version on. +- FlashLB algorithm for EPLB: supports per-step heat collection and multi-stage load balancing for better expert parallelism efficiency. [#6477](https://github.com/vllm-project/vllm-ascend/pull/6477) +- LoRA with tensor parallel and `--fully-sharded-loras` is now fixed and working. [#6650](https://github.com/vllm-project/vllm-ascend/pull/6650) +- LMCacheAscendConnector is added as a new KV cache pooling solution for Ascend. [#6882](https://github.com/vllm-project/vllm-ascend/pull/6882) +- W8A8C8 quantization is now supported for DeepSeek-V3.2 and GLM5 in PD-mix scenario. [#7029](https://github.com/vllm-project/vllm-ascend/pull/7029) +- [Experimental] Minimax-m2.5 model is now supported on Ascend NPU. [#7105](https://github.com/vllm-project/vllm-ascend/pull/7105) +- [Experimental] Mooncake Layerwise Connector now supports hybrid attention manager with multiple KV cache groups. [#7022](https://github.com/vllm-project/vllm-ascend/pull/7022) +- [Experimental] Prefix cache is now supported in hybrid model. [#7103](https://github.com/vllm-project/vllm-ascend/pull/7103) + +### Performance + +- Pipeline Parallel now supports async scheduling, improving throughput for PP deployments. [#7136](https://github.com/vllm-project/vllm-ascend/pull/7136) +- Improved TTFT when using Mooncake connector by reducing log overhead. [#6125](https://github.com/vllm-project/vllm-ascend/pull/6125) +- KV Pool lookup is optimized for short sequences (token length < block_size). [#7146](https://github.com/vllm-project/vllm-ascend/pull/7146) +- Fix penalty ops in Model Runner V2, achieving ~10% performance improvement. [#7013](https://github.com/vllm-project/vllm-ascend/pull/7013) + +### Documentation + +- Added EPD (Encode-Prefill-Decode) documentation and load-balance proxy example. [#6221](https://github.com/vllm-project/vllm-ascend/pull/6221) +- Added Ascend PyTorch Profiler usage guide. [#7117](https://github.com/vllm-project/vllm-ascend/pull/7117) +- Fixed DSV3.1 PD configuration documentation. [#7187](https://github.com/vllm-project/vllm-ascend/pull/7187) + +### Others + +- Fix drafter crash in full graph mode for speculative decoding. [#7158](https://github.com/vllm-project/vllm-ascend/pull/7158) [#7148](https://github.com/vllm-project/vllm-ascend/pull/7148) +- Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights. [#7139](https://github.com/vllm-project/vllm-ascend/pull/7139) +- Fix ngram graph replay accuracy error on 310P. [#7134](https://github.com/vllm-project/vllm-ascend/pull/7134) +- Fix FIA pad logic in graph mode after upstream vLLM change. [#7144](https://github.com/vllm-project/vllm-ascend/pull/7144) +- Fix a precision issue caused by wrong KV cache reshape on Qwen3.5. [#7209](https://github.com/vllm-project/vllm-ascend/pull/7209) +- Fix extra processes spawned on rank0 device. [#7107](https://github.com/vllm-project/vllm-ascend/pull/7107) +- Graph capture failures now properly raise exceptions for easier debugging. [#5644](https://github.com/vllm-project/vllm-ascend/pull/5644) +- Fix Qwen3.5 model by replacing torch_npu.npu_recurrent_gated_delta_rule by fused_recurrent_gated_delta_rule. [#7109](https://github.com/vllm-project/vllm-ascend/pull/7109) +- Fix the bug when running Qwen3-Reranker-0.6B with LoRA. [#7156](https://github.com/vllm-project/vllm-ascend/pull/7156) + +### Known Issue + +- GLM5 requires transformers==5.2.0, and this will resolved by [vllm-project/vllm#30566](https://github.com/vllm-project/vllm/pull/30566), will not included in v0.17.0. +- There is a precision issue with Qwen3-Next due to the changed tp weight split method. Will fix it in next release. +- The minimum number of tokens of prefix cache hit in hybrid model is 2k now + ## v0.16.0rc1 - 2026.03.09 This is the first release candidate of v0.16.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started. @@ -42,7 +95,7 @@ This is the first release candidate of v0.16.0 for vLLM Ascend. Please follow th ### Deprecation & Breaking Changes - `enable_flash_comm_v1` config option has been renamed back to `enable_sp`. [#6883](https://github.com/vllm-project/vllm-ascend/pull/6883) -- The auto-detect quantization format from model files is reverted, in v0.16.0rc1, we still need to add `---quantization ascend` to serve a model quantinized by modelslim. It will be added back in the next version after the bug with the remote model id is fixed. [#6873](https://github.com/vllm-project/vllm-ascend/pull/6873) +- The auto-detect quantization format from model files is reverted, in v0.16.0rc1, we still need to add `--quantization ascend` to serve a model quantinized by modelslim. It will be added back in the next version after the bug with the remote model id is fixed. [#6873](https://github.com/vllm-project/vllm-ascend/pull/6873) ### Documentation