[ReleaseNote] Add Release Note for v0.10.1rc1 (#2635)
Add Release Note for v0.10.1rc1
- vLLM version: v0.10.1.1
- vLLM main:
b5ee1e3261
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
@@ -52,7 +52,7 @@ Please use the following recommended versions to get started quickly:
|
||||
|
||||
| Version | Release type | Doc |
|
||||
|------------|--------------|--------------------------------------|
|
||||
|v0.10.0rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
|
||||
|v0.10.1rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
|
||||
|v0.9.1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|
|
||||
|
||||
## Contributing
|
||||
|
||||
@@ -53,7 +53,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
|
||||
|
||||
| Version | Release type | Doc |
|
||||
|------------|--------------|--------------------------------------|
|
||||
|v0.10.0rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
|
||||
|v0.10.1rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
|
||||
|v0.9.1| 最新正式/稳定版本 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|
|
||||
|
||||
## 贡献
|
||||
|
||||
@@ -15,10 +15,22 @@
|
||||
|
||||
vLLM Ascend every release would not have been possible without the following contributors:
|
||||
|
||||
Updated on 2025-06-10:
|
||||
Updated on 2025-09-03:
|
||||
|
||||
| Number | Contributor | Date | Commit ID |
|
||||
|:------:|:-----------:|:-----:|:---------:|
|
||||
| 117 | [@panchao-hub](https://github.com/panchao-hub) | 2025/8/30 | [7215454](https://github.com/vllm-project/vllm-ascend/commit/7215454de6df78f4f9a49a99c5739f8bb360f5bc) |
|
||||
| 116 | [@lidenghui1110](https://github.com/lidenghui1110) | 2025/8/29 | [600b08f](https://github.com/vllm-project/vllm-ascend/commit/600b08f7542be3409c2c70927c91471e8de33d03) |
|
||||
| 115 | [@NSDie](https://github.com/NSDie) | 2025/8/28 | [1191a64](https://github.com/vllm-project/vllm-ascend/commit/1191a64ae508183d5613711bc98a90250963f83a) |
|
||||
| 114 | [@s-jiayang](https://github.com/s-jiayang) | 2025/8/27 | [6a4ec18](https://github.com/vllm-project/vllm-ascend/commit/6a4ec186e731b9516235f4fd30b5b98227513fe7) |
|
||||
| 113 | [@LookAround0301](https://github.com/LookAround0301) | 2025/8/22 | [e9fb895](https://github.com/vllm-project/vllm-ascend/commit/e9fb895b10cef37ea634f4d4af71686b09ca9f20) |
|
||||
| 112 | [@ZhaoJiangJiang](https://github.com/ZhaoJiangJiang) | 2025/8/22 | [3629bc4](https://github.com/vllm-project/vllm-ascend/commit/3629bc4431d3edb4224761f9036b3bddb16158d6) |
|
||||
| 111 | [@NicholasTao](https://github.com/NicholasTao) | 2025/8/20 | [7bec1a9](https://github.com/vllm-project/vllm-ascend/commit/7bec1a9b9c372785551d45682bf11063ec42b216) |
|
||||
| 110 | [@gameofdimension](https://github.com/gameofdimension) | 2025/8/19 | [27d038d](https://github.com/vllm-project/vllm-ascend/commit/27d038dc663bf550a35a8f15659493b2abefda07) |
|
||||
| 109 | [@liuchenbing](https://github.com/liuchenbing) | 2025/8/19 | [3648d18](https://github.com/vllm-project/vllm-ascend/commit/3648d18e673f15a33a82d6ea95d3a9dd891ff1f5) |
|
||||
| 108 | [@LCAIZJ](https://github.com/LCAIZJ) | 2025/8/18 | [03ca2b2](https://github.com/vllm-project/vllm-ascend/commit/03ca2b26ca9ab6b9a12f021b0595a726ee35e223) |
|
||||
| 107 | [@haojiangzheng](https://github.com/haojiangzheng) | 2025/8/11 | [eb43a47](https://github.com/vllm-project/vllm-ascend/commit/eb43a475f429192e7509e85e28b1c65d5097f373) |
|
||||
| 106 | [@QwertyJack](https://github.com/QwertyJack) | 2025/8/11 | [9c6d108](https://github.com/vllm-project/vllm-ascend/commit/9c6d108330574176f79eea52f989ea6049336af8) |
|
||||
| 105 | [@SlightwindSec](https://github.com/SlightwindSec) | 2025/8/5 | [f3b50c5](https://github.com/vllm-project/vllm-ascend/commit/f3b50c54e8243ad8ccefb9b033277fbdd382a9c4) |
|
||||
| 104 | [@CaveNightingale](https://github.com/CaveNightingale) | 2025/8/4 | [957c7f1](https://github.com/vllm-project/vllm-ascend/commit/957c7f108d5f0aea230220ccdc18d657229e4030) |
|
||||
| 103 | [@underfituu](https://github.com/underfituu) | 2025/8/4 | [e38fab0](https://github.com/vllm-project/vllm-ascend/commit/e38fab011d0b81f3a8e40d9bbe263c283dd4129b) |
|
||||
|
||||
@@ -22,6 +22,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
|
||||
|
||||
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
|
||||
|-------------|--------------|------------------|-------------|--------------------|--------------|
|
||||
| v0.10.1rc1 | v0.10.1/v0.10.1.1 | >= 3.9, < 3.12 | 8.2.RC1 | 2.7.1 / 2.7.1.dev20250724 | |
|
||||
| v0.10.0rc1 | v0.10.0 | >= 3.9, < 3.12 | 8.2.RC1 | 2.7.1 / 2.7.1.dev20250724 | |
|
||||
| v0.9.2rc1 | v0.9.2 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250619 | |
|
||||
| v0.9.1 | v0.9.1 | >= 3.9, < 3.12 | 8.2.RC1 | 2.5.1 / 2.5.1.post1 | |
|
||||
@@ -41,6 +42,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
|
||||
|
||||
| Date | Event |
|
||||
|------------|-------------------------------------------|
|
||||
| 2025.09.04 | Release candidates, v0.10.1rc1 |
|
||||
| 2025.09.03 | v0.9.1 Final release |
|
||||
| 2025.08.22 | Release candidates, v0.9.1rc3 |
|
||||
| 2025.08.07 | Release candidates, v0.10.0rc1 |
|
||||
|
||||
@@ -65,15 +65,15 @@ myst_substitutions = {
|
||||
# the branch of vllm, used in vllm clone
|
||||
# - main branch: 'main'
|
||||
# - vX.Y.Z branch: 'vX.Y.Z'
|
||||
'vllm_version': 'v0.10.0',
|
||||
'vllm_version': 'v0.10.1.1',
|
||||
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
|
||||
# - main branch: 'main'
|
||||
# - vX.Y.Z branch: latest vllm-ascend release tag
|
||||
'vllm_ascend_version': 'v0.10.0rc1',
|
||||
'vllm_ascend_version': 'v0.10.1rc1',
|
||||
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
|
||||
# This value should be updated when cut down release.
|
||||
'pip_vllm_ascend_version': "0.10.0rc1",
|
||||
'pip_vllm_version': "0.10.0",
|
||||
'pip_vllm_ascend_version': "0.10.1rc1",
|
||||
'pip_vllm_version': "0.10.1.1",
|
||||
# CANN image tag
|
||||
'cann_image_tag': "8.2.rc1-910b-ubuntu22.04-py3.11",
|
||||
# vllm version in ci
|
||||
|
||||
@@ -2,9 +2,8 @@
|
||||
|
||||
## Version Specific FAQs
|
||||
|
||||
- [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007)
|
||||
- [[v0.9.1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2643)
|
||||
- [[v0.10.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2217)
|
||||
- [[v0.10.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2630)
|
||||
|
||||
## General FAQs
|
||||
|
||||
|
||||
@@ -1,5 +1,55 @@
|
||||
# Release note
|
||||
|
||||
## v0.10.1rc1 - 2025.09.04
|
||||
|
||||
This is the 1st release candidate of v0.10.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started.
|
||||
|
||||
### Highlights
|
||||
- LoRA Performance improved much through adding Custom Kernels by China Merchants Bank. [#2325](https://github.com/vllm-project/vllm-ascend/pull/2325)
|
||||
- Support Mooncake TransferEngine for kv cache register and pull_blocks style disaggregate prefill implementation. [#1568](https://github.com/vllm-project/vllm-ascend/pull/1568)
|
||||
- Support capture custom ops into aclgraph now. [#2113](https://github.com/vllm-project/vllm-ascend/pull/2113)
|
||||
|
||||
### Core
|
||||
- Add MLP tensor parallel to improve performance, but note that this will increase memory usage. [#2120](https://github.com/vllm-project/vllm-ascend/pull/2120)
|
||||
- openEuler is upgraded to 24.03. [#2631](https://github.com/vllm-project/vllm-ascend/pull/2631)
|
||||
- Add custom lmhead tensor parallel to achieve reduced memory consumption and improved TPOT performance. [#2309](https://github.com/vllm-project/vllm-ascend/pull/2309)
|
||||
- Qwen3 MoE/Qwen2.5 support torchair graph now. [#2403](https://github.com/vllm-project/vllm-ascend/pull/2403)
|
||||
- Support Sliding Window Attention with AscendSceduler, thus fixing Gemma3 accuracy issue. [#2528](https://github.com/vllm-project/vllm-ascend/pull/2528)
|
||||
|
||||
### Other
|
||||
|
||||
- Bug fixes:
|
||||
* Update the graph capture size calculation, somehow alleviated the problem that npu stream not enough in some scenarios [#2511](https://github.com/vllm-project/vllm-ascend/pull/2511)
|
||||
* Fix bugs and refactor cached mask generation logic. [#2442](https://github.com/vllm-project/vllm-ascend/pull/2442)
|
||||
* Fix the nz format does not work in quantization scenarios. [#2549](https://github.com/vllm-project/vllm-ascend/pull/2549)
|
||||
* Fix accuracy issue on Qwen series caused by enabling `enable_shared_pert_dp` by default. [#2457](https://github.com/vllm-project/vllm-ascend/pull/2457)
|
||||
* Fix accuracy issue on models whose rope dim is not equal to head dim, e.g., GLM4.5. [#2601](https://github.com/vllm-project/vllm-ascend/pull/2601)
|
||||
- Performance improved through a lot of prs:
|
||||
* Remove torch.cat and replace it by List[0]. [#2153](https://github.com/vllm-project/vllm-ascend/pull/2153)
|
||||
* Convert the format of gmm to nz. [#2474](https://github.com/vllm-project/vllm-ascend/pull/2474)
|
||||
* Optimize parallel strategies to reduce communication overhead [#2198](https://github.com/vllm-project/vllm-ascend/pull/2198)
|
||||
* Optimize reject sampler in greedy situation [#2137](https://github.com/vllm-project/vllm-ascend/pull/2137)
|
||||
- A batch of refactoring prs to enhance the code architecture:
|
||||
* Refactor on MLA. [#2465](https://github.com/vllm-project/vllm-ascend/pull/2465)
|
||||
* Refactor on torchair fused_moe. [#2438](https://github.com/vllm-project/vllm-ascend/pull/2438)
|
||||
* Refactor on allgather/mc2-related fused_experts. [#2369](https://github.com/vllm-project/vllm-ascend/pull/2369)
|
||||
* Refactor on torchair model runner. [#2208](https://github.com/vllm-project/vllm-ascend/pull/2208)
|
||||
* Refactor on CI. [#2276](https://github.com/vllm-project/vllm-ascend/pull/2276)
|
||||
- Parameters changes:
|
||||
* Add `lmhead_tensor_parallel_size` in `additional_config`, set it to enable lmhead tensor parallel. [#2309](https://github.com/vllm-project/vllm-ascend/pull/2309)
|
||||
* Some unused environ variables `HCCN_PATH`, `PROMPT_DEVICE_ID`, `DECODE_DEVICE_ID`, `LLMDATADIST_COMM_PORT` and `LLMDATADIST_SYNC_CACHE_WAIT_TIME` are removed. [#2448](https://github.com/vllm-project/vllm-ascend/pull/2448)
|
||||
* Environ variable `VLLM_LLMDD_RPC_PORT` is renamed to `VLLM_ASCEND_LLMDD_RPC_PORT` now. [#2450](https://github.com/vllm-project/vllm-ascend/pull/2450)
|
||||
* Add `VLLM_ASCEND_ENABLE_MLP_OPTIMIZE` in environ variables, Whether to enable mlp optimize when tensor parallel is enabled, this feature in eager mode will get better performance. [#2120](https://github.com/vllm-project/vllm-ascend/pull/2120)
|
||||
* Remove `MOE_ALL2ALL_BUFFER` and `VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ` in environ variables.[#2612](https://github.com/vllm-project/vllm-ascend/pull/2612)
|
||||
* Add `enable_prefetch` in `additional_config`, whether to enable weight prefetch. [#2465](https://github.com/vllm-project/vllm-ascend/pull/2465)
|
||||
* Add `mode` in `additional_config.torchair_graph_config`, When using reduce-overhead mode for torchair, mode needs to be set. [#2461](https://github.com/vllm-project/vllm-ascend/pull/2461)
|
||||
* `enable_shared_expert_dp` in `additional_config` is disabled by default now, and it is recommended to enable when inferencing with deepseek. [#2457](https://github.com/vllm-project/vllm-ascend/pull/2457)
|
||||
|
||||
### Known Issues
|
||||
|
||||
- Sliding window attention not support chunked prefill currently, thus we could only enable AscendScheduler to run with it. [#2729](https://github.com/vllm-project/vllm-ascend/issues/2729)
|
||||
- There is a bug with creating mc2_mask when MultiStream is enabled, will fix it in next release. [#2681](https://github.com/vllm-project/vllm-ascend/pull/2681)
|
||||
|
||||
## v0.9.1 - 2025.09.03
|
||||
|
||||
We are excited to announce the newest official release of vLLM Ascend. This release includes many feature supports, performance improvements and bug fixes. We recommend users to upgrade from 0.7.3 to this version. Please always set `VLLM_USE_V1=1` to use V1 engine.
|
||||
|
||||
Reference in New Issue
Block a user