[Doc][ReleaseNote] Add release notes for v0.16.0rc1 (#7067)
Add release notes for v0.16.0rc1
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: Canlin Guo <961750412@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -63,7 +63,7 @@ Please use the following recommended versions to get started quickly:
|
|||||||
|
|
||||||
| Version | Release type | Doc |
|
| Version | Release type | Doc |
|
||||||
|------------|--------------|--------------------------------------|
|
|------------|--------------|--------------------------------------|
|
||||||
| v0.14.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details |
|
| v0.16.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details |
|
||||||
| v0.13.0 | Latest stable version | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html) for more details |
|
| v0.13.0 | Latest stable version | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html) for more details |
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
@@ -86,7 +86,7 @@ Below are the maintained branches:
|
|||||||
|
|
||||||
| Branch | Status | Note |
|
| Branch | Status | Note |
|
||||||
|------------|--------------|--------------------------------------|
|
|------------|--------------|--------------------------------------|
|
||||||
| main | Maintained | CI commitment for vLLM main branch and vLLM v0.13.0 tag |
|
| main | Maintained | CI commitment for vLLM main branch and vLLM v0.16.0 tag |
|
||||||
| v0.7.1-dev | Unmaintained | Only doc fixes are allowed |
|
| v0.7.1-dev | Unmaintained | Only doc fixes are allowed |
|
||||||
| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fixes are allowed, and no new release tags anymore. |
|
| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fixes are allowed, and no new release tags anymore. |
|
||||||
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |
|
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |
|
||||||
|
|||||||
@@ -57,7 +57,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
|
|||||||
|
|
||||||
| Version | Release type | Doc |
|
| Version | Release type | Doc |
|
||||||
|------------|--------------|--------------------------------------|
|
|------------|--------------|--------------------------------------|
|
||||||
|v0.14.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多|
|
|v0.16.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)和[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多|
|
||||||
|v0.13.0| 最新正式/稳定版本 |[快速开始](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [安装指南](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html)了解更多|
|
|v0.13.0| 最新正式/稳定版本 |[快速开始](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [安装指南](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html)了解更多|
|
||||||
|
|
||||||
## 贡献
|
## 贡献
|
||||||
@@ -80,7 +80,7 @@ vllm-ascend有主干分支和开发分支。
|
|||||||
|
|
||||||
| 分支 | 状态 | 备注 |
|
| 分支 | 状态 | 备注 |
|
||||||
|------------|------------|---------------------|
|
|------------|------------|---------------------|
|
||||||
| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.13.0)CI看护 |
|
| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.16.0)CI看护 |
|
||||||
| v0.7.1-dev | Unmaintained | 只允许文档修复 |
|
| v0.7.1-dev | Unmaintained | 只允许文档修复 |
|
||||||
| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 |
|
| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 |
|
||||||
| v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 |
|
| v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 |
|
||||||
|
|||||||
@@ -23,6 +23,7 @@ The table below is the release compatibility matrix for vLLM Ascend release.
|
|||||||
|
|
||||||
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | Triton Ascend |
|
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | Triton Ascend |
|
||||||
|-------------|-------------------|-----------------|-------------|---------------------------------|---------------|
|
|-------------|-------------------|-----------------|-------------|---------------------------------|---------------|
|
||||||
|
| v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
|
||||||
| v0.15.0rc1 | v0.15.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
|
| v0.15.0rc1 | v0.15.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
|
||||||
| v0.14.0rc1 | v0.14.1 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
|
| v0.14.0rc1 | v0.14.1 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
|
||||||
| v0.13.0 | v0.13.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.8.0.post2 | 3.2.0 |
|
| v0.13.0 | v0.13.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.8.0.post2 | 3.2.0 |
|
||||||
@@ -65,6 +66,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL
|
|||||||
|
|
||||||
| Date | Event |
|
| Date | Event |
|
||||||
|------------|-------------------------------------------|
|
|------------|-------------------------------------------|
|
||||||
|
| 2026.03.10 | Release candidates, v0.16.0rc1 |
|
||||||
| 2026.02.27 | Release candidates, v0.15.0rc1 |
|
| 2026.02.27 | Release candidates, v0.15.0rc1 |
|
||||||
| 2026.02.06 | v0.13.0 Final release, v0.13.0 |
|
| 2026.02.06 | v0.13.0 Final release, v0.13.0 |
|
||||||
| 2026.01.26 | Release candidates, v0.14.0rc1 |
|
| 2026.01.26 | Release candidates, v0.14.0rc1 |
|
||||||
@@ -122,7 +124,7 @@ Usually, each minor version of vLLM (such as 0.7) corresponds to a vLLM Ascend v
|
|||||||
|
|
||||||
| Branch | State | Note |
|
| Branch | State | Note |
|
||||||
| ---------- | ------------ | -------------------------------------------------------- |
|
| ---------- | ------------ | -------------------------------------------------------- |
|
||||||
| main | Maintained | CI commitment for vLLM main branch and vLLM 0.13.0 tag |
|
| main | Maintained | CI commitment for vLLM main branch and vLLM 0.16.0 tag |
|
||||||
| releases/v0.13.0 | Maintained | CI commitment for vLLM 0.13.0 version |
|
| releases/v0.13.0 | Maintained | CI commitment for vLLM 0.13.0 version |
|
||||||
| v0.11.0-dev| Maintained | CI commitment for vLLM 0.11.0 version |
|
| v0.11.0-dev| Maintained | CI commitment for vLLM 0.11.0 version |
|
||||||
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |
|
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |
|
||||||
|
|||||||
@@ -65,15 +65,15 @@ myst_substitutions = {
|
|||||||
# the branch of vllm, used in vllm clone
|
# the branch of vllm, used in vllm clone
|
||||||
# - main branch: 'main'
|
# - main branch: 'main'
|
||||||
# - vX.Y.Z branch: 'vX.Y.Z'
|
# - vX.Y.Z branch: 'vX.Y.Z'
|
||||||
"vllm_version": "v0.15.0",
|
"vllm_version": "v0.16.0",
|
||||||
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
|
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
|
||||||
# - main branch: 'main'
|
# - main branch: 'main'
|
||||||
# - vX.Y.Z branch: latest vllm-ascend release tag
|
# - vX.Y.Z branch: latest vllm-ascend release tag
|
||||||
"vllm_ascend_version": "v0.15.0rc1",
|
"vllm_ascend_version": "v0.16.0rc1",
|
||||||
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
|
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
|
||||||
# This value should be updated when cut down release.
|
# This value should be updated when cut down release.
|
||||||
"pip_vllm_ascend_version": "0.15.0rc1",
|
"pip_vllm_ascend_version": "0.16.0rc1",
|
||||||
"pip_vllm_version": "0.15.0",
|
"pip_vllm_version": "0.16.0",
|
||||||
# CANN image tag
|
# CANN image tag
|
||||||
"cann_image_tag": "8.5.0-910b-ubuntu22.04-py3.11",
|
"cann_image_tag": "8.5.0-910b-ubuntu22.04-py3.11",
|
||||||
# vllm version in ci
|
# vllm version in ci
|
||||||
|
|||||||
@@ -2,6 +2,7 @@
|
|||||||
|
|
||||||
## Version Specific FAQs
|
## Version Specific FAQs
|
||||||
|
|
||||||
|
- [[v0.16.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6969)
|
||||||
- [[v0.15.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6838)
|
- [[v0.15.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6838)
|
||||||
- [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583)
|
- [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583)
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,96 @@
|
|||||||
# Release Notes
|
# Release Notes
|
||||||
|
|
||||||
|
## v0.16.0rc1 - 2026.03.09
|
||||||
|
|
||||||
|
This is the first release candidate of v0.16.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
|
||||||
|
|
||||||
|
### Highlights
|
||||||
|
|
||||||
|
- Qwen3-Omni quantization adaptation and optimization is now available. [#6828](https://github.com/vllm-project/vllm-ascend/pull/6828)
|
||||||
|
- GLM5-W8A8 quantization is now supported by parameterizing hardcoded MLA dimensions. [#6902](https://github.com/vllm-project/vllm-ascend/pull/6902)
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- [Experimental] Support FabricMem Mode for ADXL/HIXL interconnect. [#6806](https://github.com/vllm-project/vllm-ascend/pull/6806)
|
||||||
|
- Qwen3-Next now supports FlashComm1. [#6830](https://github.com/vllm-project/vllm-ascend/pull/6830)
|
||||||
|
- NPUWorker Profiler now supports profile_prefix for better profiling experience. [#6968](https://github.com/vllm-project/vllm-ascend/pull/6968)
|
||||||
|
- EPLB profiling now displays expert hotness comparison and time required for eplb adjustment. [#6877](https://github.com/vllm-project/vllm-ascend/pull/6877) [#7001](https://github.com/vllm-project/vllm-ascend/pull/7001)]
|
||||||
|
- Xlite Qwen3 MoE now supports Data Parallel. [#6715](https://github.com/vllm-project/vllm-ascend/pull/6715)
|
||||||
|
- Mooncake Layerwise Connector now supports kv_pool. [#7032](https://github.com/vllm-project/vllm-ascend/pull/7032)
|
||||||
|
- Eagle3 now supports QuaRot quantization without embedding. [#7038](https://github.com/vllm-project/vllm-ascend/pull/7038)
|
||||||
|
|
||||||
|
### Hardware and Operator Support
|
||||||
|
|
||||||
|
- 310P now supports w8a8sc quantization method. [#7075](https://github.com/vllm-project/vllm-ascend/pull/7075)
|
||||||
|
- Added AscendC casual_conv1d_fn operator for Qwen3-Next. [#6661](https://github.com/vllm-project/vllm-ascend/pull/6661)
|
||||||
|
- Added Ascend Ops recurrent_gated_delta_rule operator. [#6725](https://github.com/vllm-project/vllm-ascend/pull/6725)
|
||||||
|
- Added GMM custom operator for MoE models. [#7010](https://github.com/vllm-project/vllm-ascend/pull/7010)
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
- Faster convolution computation improves TTFT by 0.95% and throughput by 0.59% for Qwen3-VL models. [#7017](https://github.com/vllm-project/vllm-ascend/pull/7017)
|
||||||
|
- Optimize split_qkv_rmsnorm_rope operator. [#6827](https://github.com/vllm-project/vllm-ascend/pull/6827)
|
||||||
|
- Implement global CPU slicing and improve IRQ binding for Ascend NPUs, ensuring non-overlapping CPU partitions and better resource management. [#6945](https://github.com/vllm-project/vllm-ascend/pull/6945)
|
||||||
|
- Optimize MTP execution by reordering state update operation. [#6844](https://github.com/vllm-project/vllm-ascend/pull/6844)
|
||||||
|
- Avoid CPU sync in mrope_positions copy by using full tensor copy. [#7014](https://github.com/vllm-project/vllm-ascend/pull/7014)
|
||||||
|
- Remove H2D synchronization for expert_map in MoE models. [#7000](https://github.com/vllm-project/vllm-ascend/pull/7000)
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
- CANN is upgraded to 8.5.1, please remember to upgrade by hand if you're not using the official image. [#6897](https://github.com/vllm-project/vllm-ascend/pull/6897)
|
||||||
|
|
||||||
|
### Deprecation & Breaking Changes
|
||||||
|
|
||||||
|
- `enable_flash_comm_v1` config option has been renamed back to `enable_sp`. [#6883](https://github.com/vllm-project/vllm-ascend/pull/6883)
|
||||||
|
- The auto-detect quantization format from model files is reverted, in v0.16.0rc1, we still need to add `---quantization ascend` to serve a model quantinized by modelslim. It will be added back in the next version after the bug with the remote model id is fixed. [#6873](https://github.com/vllm-project/vllm-ascend/pull/6873)
|
||||||
|
|
||||||
|
### Documentation
|
||||||
|
|
||||||
|
- Added user/developer guide for CPU binding. [#7045](https://github.com/vllm-project/vllm-ascend/pull/7045)
|
||||||
|
- Added metrics usage documentation and example. [#6962](https://github.com/vllm-project/vllm-ascend/pull/6962)
|
||||||
|
- Added llms.txt for LLM discovery. [#6886](https://github.com/vllm-project/vllm-ascend/pull/6886)
|
||||||
|
- Added GLM4.x multi-node deploy tutorial. [#6872](https://github.com/vllm-project/vllm-ascend/pull/6872)
|
||||||
|
- Added explanation of 310p special param: max-model-len. [#7065](https://github.com/vllm-project/vllm-ascend/pull/7065)
|
||||||
|
|
||||||
|
### Others
|
||||||
|
|
||||||
|
- Fix openEuler Dockerfile error. [#6871](https://github.com/vllm-project/vllm-ascend/pull/6871)
|
||||||
|
- Many bug fixes including:
|
||||||
|
- Fix Eagle speculative decoding with Context Parallel enabled. [#6981](https://github.com/vllm-project/vllm-ascend/pull/6981) [#7079](https://github.com/vllm-project/vllm-ascend/pull/7079)
|
||||||
|
- Fix LoRA accuracy issue introduced by upstream vLLM changes. [#6958](https://github.com/vllm-project/vllm-ascend/pull/6958)
|
||||||
|
- Fix streaming content-type in load balance proxy server. [#6985](https://github.com/vllm-project/vllm-ascend/pull/6985)
|
||||||
|
- Fix metadata execute error: integer modulo by zero. [#6521](https://github.com/vllm-project/vllm-ascend/pull/6521)
|
||||||
|
- Fix triton rope_siso implementation bug. [#7082](https://github.com/vllm-project/vllm-ascend/pull/7082)
|
||||||
|
- Fix incorrect layer count for MTP models in update_aclgraph_sizes. [#7064](https://github.com/vllm-project/vllm-ascend/pull/7064)
|
||||||
|
- Fix compilation errors for CANN versions subsequent to b020. [#7059](https://github.com/vllm-project/vllm-ascend/pull/7059)
|
||||||
|
- Fix quant config support in GLM4.6V. [#7062](https://github.com/vllm-project/vllm-ascend/pull/7062)
|
||||||
|
- Fix parameter ordering bug in _merge_multimodal_embeddings. [#7068](https://github.com/vllm-project/vllm-ascend/pull/7068)
|
||||||
|
- Fix fused mc2 bug in EPLB. [#6794](https://github.com/vllm-project/vllm-ascend/pull/6794)
|
||||||
|
- Fix kernel block size for computing slot mapping. [#7019](https://github.com/vllm-project/vllm-ascend/pull/7019)
|
||||||
|
- Fix layerwise stacking MTP error in P/D disaggregation. [#7036](https://github.com/vllm-project/vllm-ascend/pull/7036)
|
||||||
|
- Fix RoPE dimension for npu_rotary_embedding. [#6880](https://github.com/vllm-project/vllm-ascend/pull/6880)
|
||||||
|
- Fix Qwen-Omni quantization bugs. [#7042](https://github.com/vllm-project/vllm-ascend/pull/7042) [#7007](https://github.com/vllm-project/vllm-ascend/pull/7007)
|
||||||
|
- Fix GDN layer accuracy in graph mode. [#6822](https://github.com/vllm-project/vllm-ascend/pull/6822)
|
||||||
|
- Fix precision bugs for PCP/DCP in PD disaggregate. [#6876](https://github.com/vllm-project/vllm-ascend/pull/6876)
|
||||||
|
- Fix MTP in PD disaggregation with fullgraph support for all D-Nodes. [#6948](https://github.com/vllm-project/vllm-ascend/pull/6948)
|
||||||
|
- Fix GQA model error when enabling both DP and DCP. [#7012](https://github.com/vllm-project/vllm-ascend/pull/7012)
|
||||||
|
- Fix MTP prefill misclassified as decode edge case. [#6835](https://github.com/vllm-project/vllm-ascend/pull/6835)
|
||||||
|
- Fix Eagle3 acceptance rate for QuaRot quantized models. [#6914](https://github.com/vllm-project/vllm-ascend/pull/6914)
|
||||||
|
- Fix RoPE shape mismatch for MTP models with FlashComm V1 enabled. [#6939](https://github.com/vllm-project/vllm-ascend/pull/6939)
|
||||||
|
- Fix Qwen2.5VL accuracy issue. [#6975](https://github.com/vllm-project/vllm-ascend/pull/6975)
|
||||||
|
- Fix MoE forward error with static kernel enabled. [#6964](https://github.com/vllm-project/vllm-ascend/pull/6964)
|
||||||
|
- Fix muls_add fusion for GLM5 models. [#6928](https://github.com/vllm-project/vllm-ascend/pull/6928)
|
||||||
|
- Fix GDN layer detection for multimodal models. [#6941](https://github.com/vllm-project/vllm-ascend/pull/6941)
|
||||||
|
- Fix 300I unquant model weight nd2nz error. [#6851](https://github.com/vllm-project/vllm-ascend/pull/6851)
|
||||||
|
- Fix CPU binding logic. [#6889](https://github.com/vllm-project/vllm-ascend/pull/6889)
|
||||||
|
- Fix Eagle fullgraph shape capture. [#6846](https://github.com/vllm-project/vllm-ascend/pull/6846)
|
||||||
|
|
||||||
|
### Known Issue
|
||||||
|
|
||||||
|
- Currently, for DeepSeek v3.2, PCP & DCP do not yet work with FlashComm1 feature, which may cause serve errors or other unknown errors.
|
||||||
|
- In 4-node A3 PD disaggregation deployment with DeepSeek V3.2, the P-Node may hang when benchmarking in high concurrency scenario, e.g., 2K/2K tokens with 512 concurrent requests.
|
||||||
|
- MTP with large EP configurations may cause graph capture buffer overflow. This is a bug need to fix in vLLM, now there is a workaround to avoid it: explicitly set `--compilation-config '{"max_cudagraph_capture_size": N}'` where `N = max_concurrency × (1 + num_speculative_tokens)`.
|
||||||
|
|
||||||
## v0.15.0rc1 - 2026.02.27
|
## v0.15.0rc1 - 2026.02.27
|
||||||
|
|
||||||
This is the first release candidate of v0.15.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
|
This is the first release candidate of v0.15.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
|
||||||
|
|||||||
Reference in New Issue
Block a user