From 9c560b009ae0439706b99856c967f511e8df4302 Mon Sep 17 00:00:00 2001 From: wangxiyuan Date: Fri, 11 Jul 2025 17:36:05 +0800 Subject: [PATCH] [Release] Add 0.9.2rc1 release note (#1725) Add release note for 0.9.2rc1, we'll release soon - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/7bd4c37ae7c6f2223c1a031bbdd2e3435d53da94 Signed-off-by: wangxiyuan --- README.md | 3 ++- README.zh.md | 3 ++- docs/source/community/contributors.md | 17 ++++++++++++ docs/source/community/versioning_policy.md | 2 ++ docs/source/conf.py | 8 +++--- docs/source/faqs.md | 18 ++++++------- docs/source/installation.md | 2 +- docs/source/user_guide/release_notes.md | 31 ++++++++++++++++++++++ 8 files changed, 68 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 7e5918c..ac8820d 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,8 @@ Below is maintained branches: |------------|--------------|--------------------------------------| | main | Maintained | CI commitment for vLLM main branch and vLLM 0.9.x branch | | v0.7.1-dev | Unmaintained | Only doc fixed is allowed | -| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version | +| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fix is allowed and no new release tag any more. | +| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version | Please refer to [Versioning policy](https://vllm-ascend.readthedocs.io/en/latest/community/versioning_policy.html) for more details. diff --git a/README.zh.md b/README.zh.md index 55a40f5..cb296e3 100644 --- a/README.zh.md +++ b/README.zh.md @@ -65,7 +65,8 @@ vllm-ascend有主干分支和开发分支。 |------------|------------|---------------------| | main | Maintained | 基于vLLM main分支CI看护 | | v0.7.1-dev | Unmaintained | 只允许文档修复 | -| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护 | +| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 | +| v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 | 请参阅[版本策略](https://vllm-ascend.readthedocs.io/en/latest/community/versioning_policy.html)了解更多详细信息。 diff --git a/docs/source/community/contributors.md b/docs/source/community/contributors.md index 16ea600..dd0e9ee 100644 --- a/docs/source/community/contributors.md +++ b/docs/source/community/contributors.md @@ -17,6 +17,23 @@ Updated on 2025-06-10: | Number | Contributor | Date | Commit ID | |:------:|:-----------:|:-----:|:---------:| +| 83 | [@ZhengWG](https://github.com/) | 2025/7/7 | [3a469de](https://github.com/vllm-project/vllm-ascend/commit/9c886d0a1f0fc011692090b0395d734c83a469de) | +| 82 | [@wm901115nwpu](https://github.com/) | 2025/7/7 | [a2a47d4](https://github.com/vllm-project/vllm-ascend/commit/f08c4f15a27f0f27132f4ca7a0c226bf0a2a47d4) | +| 81 | [@Agonixiaoxiao](https://github.com/) | 2025/7/2 | [6f84576](https://github.com/vllm-project/vllm-ascend/commit/7fc1a984890bd930f670deedcb2dda3a46f84576) | +| 80 | [@zhanghw0354](https://github.com/zhanghw0354) | 2025/7/2 | [d3df9a5](https://github.com/vllm-project/vllm-ascend/commit/9fb3d558e5b57a3c97ee5e11b9f5dba6ad3df9a5) | +| 79 | [@GDzhu01](https://github.com/GDzhu01) | 2025/6/28 | [de256ac](https://github.com/vllm-project/vllm-ascend/commit/b308a7a25897b88d4a23a9e3d583f4ec6de256ac) | +| 78 | [@leo-pony](https://github.com/leo-pony) | 2025/6/26 | [3f2a5f2](https://github.com/vllm-project/vllm-ascend/commit/10253449120307e3b45f99d82218ba53e3f2a5f2) | +| 77 | [@zeshengzong](https://github.com/zeshengzong) | 2025/6/26 | [3ee25aa](https://github.com/vllm-project/vllm-ascend/commit/192dbbcc6e244a8471d3c00033dc637233ee25aa) | +| 76 | [@sharonyunyun](https://github.com/sharonyunyun) | 2025/6/25 | [2dd8666](https://github.com/vllm-project/vllm-ascend/commit/941269a6c5bbc79f6c1b6abd4680dc5802dd8666) | +| 75 | [@Pr0Wh1teGivee](https://github.com/Pr0Wh1teGivee) | 2025/6/25 | [c65dd40](https://github.com/vllm-project/vllm-ascend/commit/2fda60464c287fe456b4a2f27e63996edc65dd40) | +| 74 | [@xleoken](https://github.com/xleoken) | 2025/6/23 | [c604de0](https://github.com/vllm-project/vllm-ascend/commit/4447e53d7ad5edcda978ca6b0a3a26a73c604de0) | +| 73 | [@lyj-jjj](https://github.com/lyj-jjj) | 2025/6/23 | [5cbd74e](https://github.com/vllm-project/vllm-ascend/commit/5177bef87a21331dcca11159d3d1438075cbd74e) | +| 72 | [@farawayboat](https://github.com/farawayboat)| 2025/6/21 | [bc7d392](https://github.com/vllm-project/vllm-ascend/commit/097e7149f75c0806774bc68207f0f6270bc7d392) +| 71 | [@yuancaoyaoHW](https://github.com/yuancaoyaoHW) | 2025/6/20 | [7aa0b94](https://github.com/vllm-project/vllm-ascend/commit/00ae250f3ced68317bc91c93dc1f1a0977aa0b94) +| 70 | [@songshanhu07](https://github.com/songshanhu07) | 2025/6/18 | [5e1de1f](https://github.com/vllm-project/vllm-ascend/commit/2a70dbbdb8f55002de3313e17dfd595e1de1f) +| 69 | [@wangyanhui-cmss](https://github.com/wangyanhui-cmss) | 2025/6/12| [40c9e88](https://github.com/vllm-project/vllm-ascend/commit/2a5fb4014b863cee6abc3009f5bc5340c9e88) | +| 68 | [@chenwaner](https://github.com/chenwaner) | 2025/6/11 | [c696169](https://github.com/vllm-project/vllm-ascend/commit/e46dc142bf1180453c64226d76854fc1ec696169) | +| 67 | [@yzim](https://github.com/yzim) | 2025/6/11 | [aaf701b](https://github.com/vllm-project/vllm-ascend/commit/4153a5091b698c2270d160409e7fee73baaf701b) | | 66 | [@Yuxiao-Xu](https://github.com/Yuxiao-Xu) | 2025/6/9 | [6b853f1](https://github.com/vllm-project/vllm-ascend/commit/6b853f15fe69ba335d2745ebcf14a164d0bcc505) | | 65 | [@ChenTaoyu-SJTU](https://github.com/ChenTaoyu-SJTU) | 2025/6/7 | [20dedba](https://github.com/vllm-project/vllm-ascend/commit/20dedba5d1fc84b7ae8b49f9ce3e3649389e2193) | | 64 | [@zxdukki](https://github.com/zxdukki) | 2025/6/7 | [87ebaef](https://github.com/vllm-project/vllm-ascend/commit/87ebaef4e4e519988f27a6aa378f614642202ecf) | diff --git a/docs/source/community/versioning_policy.md b/docs/source/community/versioning_policy.md index 386b2b0..2dd13e1 100644 --- a/docs/source/community/versioning_policy.md +++ b/docs/source/community/versioning_policy.md @@ -22,6 +22,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin: | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo | |-------------|--------------|------------------|-------------|--------------------|--------------| +| v0.9.2rc1 | v0.9.2 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250619 | | | v0.9.1rc1 | v0.9.1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250528 | | | v0.9.0rc2 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | | | v0.9.0rc1 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | | @@ -36,6 +37,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin: | Date | Event | |------------|-------------------------------------------| +| 2025.07.11 | Release candidates, v0.9.2rc1 | | 2025.06.22 | Release candidates, v0.9.1rc1 | | 2025.06.10 | Release candidates, v0.9.0rc2 | | 2025.06.09 | Release candidates, v0.9.0rc1 | diff --git a/docs/source/conf.py b/docs/source/conf.py index 88cb647..551c3b2 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -65,15 +65,15 @@ myst_substitutions = { # the branch of vllm, used in vllm clone # - main branch: 'main' # - vX.Y.Z branch: 'vX.Y.Z' - 'vllm_version': 'v0.9.1', + 'vllm_version': 'v0.9.2', # the branch of vllm-ascend, used in vllm-ascend clone and image tag # - main branch: 'main' # - vX.Y.Z branch: latest vllm-ascend release tag - 'vllm_ascend_version': 'v0.9.1rc1', + 'vllm_ascend_version': 'v0.9.2rc1', # the newest release version of vllm-ascend and matched vLLM, used in pip install. # This value should be updated when cut down release. - 'pip_vllm_ascend_version': "0.9.1rc1", - 'pip_vllm_version': "0.9.1", + 'pip_vllm_ascend_version': "0.9.2rc1", + 'pip_vllm_version': "0.9.2", # CANN image tag 'cann_image_tag': "8.1.rc1-910b-ubuntu22.04-py3.10", # vllm version in ci diff --git a/docs/source/faqs.md b/docs/source/faqs.md index ac43bef..a0de425 100644 --- a/docs/source/faqs.md +++ b/docs/source/faqs.md @@ -3,19 +3,19 @@ ## Version Specific FAQs - [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007) -- [[v0.9.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1351) +- [[v0.9.2rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1742) ## General FAQs ### 1. What devices are currently supported? -Currently, **ONLY Atlas A2 series** (Ascend-cann-kernels-910b) are supported: +Currently, **ONLY** Atlas A2 series(Ascend-cann-kernels-910b) and Atlas 300I(Ascend-cann-kernels-310p) series are supported: - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2) - Atlas 800I A2 Inference series (Atlas 800I A2) +- Atlas 300I Inference series (Atlas 300I Duo) Below series are NOT supported yet: -- Atlas 300I Duo、Atlas 300I Pro (Ascend-cann-kernels-310p) might be supported on 2025.Q2 - Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet - Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet @@ -35,7 +35,7 @@ docker pull m.daocloud.io/quay.io/ascend/vllm-ascend:$TAG ### 3. What models does vllm-ascend supports? -Find more details [here](https://vllm-ascend.readthedocs.io/en/latest/user_guide/supported_models.html). +Find more details [here](https://vllm-ascend.readthedocs.io/en/latest/user_guide/support_matrix/supported_models.html). ### 4. How to get in touch with our community? @@ -48,7 +48,7 @@ There are many channels that you can communicate with our community developers / ### 5. What features does vllm-ascend V1 supports? -Find more details [here](https://github.com/vllm-project/vllm-ascend/issues/414). +Find more details [here](https://vllm-ascend.readthedocs.io/en/latest/user_guide/support_matrix/supported_features.html). ### 6. How to solve the problem of "Failed to infer device type" or "libatb.so: cannot open shared object file"? @@ -69,7 +69,7 @@ If all above steps are not working, feel free to submit a GitHub issue. ### 7. How does vllm-ascend perform? -Currently, only some models are improved. Such as `Qwen2 VL`, `Deepseek V3`. Others are not good enough. From 0.9.0rc2, Qwen and Deepseek works with graph mode to play a good performance. What's more, you can install `mindie-turbo` with `vllm-ascend v0.7.3` to speed up the inference as well. +Currently, only some models are improved. Such as `Qwen2.5 VL`, `Qwen3`, `Deepseek V3`. Others are not good enough. From 0.9.0rc2, Qwen and Deepseek works with graph mode to play a good performance. What's more, you can install `mindie-turbo` with `vllm-ascend v0.7.3` to speed up the inference as well. ### 8. How vllm-ascend work with vllm? vllm-ascend is a plugin for vllm. Basically, the version of vllm-ascend is the same as the version of vllm. For example, if you use vllm 0.7.3, you should use vllm-ascend 0.7.3 as well. For main branch, we will make sure `vllm-ascend` and `vllm` are compatible by each commit. @@ -84,7 +84,7 @@ Currently, w8a8 quantization is already supported by vllm-ascend originally on v ### 11. How to run w8a8 DeepSeek model? -Please following the [quantization inferencing tutorail](https://vllm-ascend.readthedocs.io/en/main/tutorials/multi_npu_quantization.html) and replace model to DeepSeek. +Please following the [inferencing tutorail](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node.html) and replace model to DeepSeek. ### 12. There is no output in log when loading models using vllm-ascend, How to solve it? @@ -94,9 +94,9 @@ If you're using vllm 0.7.3 version, this is a known progress bar display issue i vllm-ascend is tested by functional test, performance test and accuracy test. -- **Functional test**: we added CI, includes portion of vllm's native unit tests and vllm-ascend's own unit tests,on vllm-ascend's test, we test basic functionality、popular models availability and [supported features](https://vllm-ascend.readthedocs.io/en/latest/user_guide/suppoted_features.html) via e2e test +- **Functional test**: we added CI, includes portion of vllm's native unit tests and vllm-ascend's own unit tests,on vllm-ascend's test, we test basic functionality、popular models availability and [supported features](https://vllm-ascend.readthedocs.io/en/latest/user_guide/support_matrix/supported_features.html) via e2e test -- **Performance test**: we provide [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks) tools for end-to-end performance benchmark which can easily to re-route locally, we'll publish a perf website like [vllm](https://simon-mo-workspace.observablehq.cloud/vllm-dashboard-v0/perf) does to show the performance test results for each pull request +- **Performance test**: we provide [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks) tools for end-to-end performance benchmark which can easily to re-route locally, we'll publish a perf website to show the performance test results for each pull request - **Accuracy test**: we're working on adding accuracy test to CI as well. diff --git a/docs/source/installation.md b/docs/source/installation.md index 8f21aa8..3a98e0a 100644 --- a/docs/source/installation.md +++ b/docs/source/installation.md @@ -12,7 +12,7 @@ This document describes how to install vllm-ascend manually. | Software | Supported version | Note | |---------------|----------------------------------|-------------------------------------------| | CANN | >= 8.1.RC1 | Required for vllm-ascend and torch-npu | - | torch-npu | >= 2.5.1.post1.dev20250619 | Required for vllm-ascend | + | torch-npu | >= 2.5.1.post1.dev20250619 | Required for vllm-ascend, No need to install manually, it will be auto installed in below steps | | torch | >= 2.5.1 | Required for torch-npu and vllm | You have 2 way to install: diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md index f8c76c9..41dd05c 100644 --- a/docs/source/user_guide/release_notes.md +++ b/docs/source/user_guide/release_notes.md @@ -1,5 +1,36 @@ # Release note +## v0.9.2rc1 - 2025.07.11 + +This is the 1st release candidate of v0.9.2 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started. From this release, V1 engine will be enabled by default, there is no need to set `VLLM_USE_V1=1` any more. And this release is the last version to support V0 engine, V0 code will be clean up in the future. + +### Highlights +- Pooling model works with V1 engine now. You can take a try with Qwen3 embedding model [#1359](https://github.com/vllm-project/vllm-ascend/pull/1359). +- The performance on Atlas 300I series has been improved. [#1591](https://github.com/vllm-project/vllm-ascend/pull/1591) +- aclgraph mode works with Moe models now. Currently, only Qwen3 Moe is well tested. [#1381](https://github.com/vllm-project/vllm-ascend/pull/1381) +- Pipeline parallelism works with V1 Engine now. [#1700](https://github.com/vllm-project/vllm-ascend/pull/1700) + +### Core +- Ascend PyTorch adapter (torch_npu) has been upgraded to `2.5.1.post1.dev20250619`. Don’t forget to update it in your environment. [#1347](https://github.com/vllm-project/vllm-ascend/pull/1347) +- The **GatherV3** error has been fixed with **aclgraph** mode. [#1416](https://github.com/vllm-project/vllm-ascend/pull/1416) +- W8A8 quantization works on Atlas 300I series now. [#1560](https://github.com/vllm-project/vllm-ascend/pull/1560) +- Fix the accuracy problem with deploy models with parallel parameters. [#1678](https://github.com/vllm-project/vllm-ascend/pull/1678) +- The pre-built wheel package now requires lower version of glibc. Users can use it by `pip install vllm-ascend` directly. [#1582](https://github.com/vllm-project/vllm-ascend/pull/1582) + +## Other +- Official doc has been updated for better read experience. For example, more deployment tutorials are added, user/developer docs are updated. More guide will coming soon. +- Fix accuracy problem for deepseek V3/R1 models with torchair graph in long sequence predictions. [#1331](https://github.com/vllm-project/vllm-ascend/pull/1331) +- A new env variable `VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP` has been added. It enables the fused allgather-experts kernel for Deepseek V3/R1 models. The default value is `0`. [#1335](https://github.com/vllm-project/vllm-ascend/pull/1335) +- A new env variable `VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION` has been added to improve the performance of topk-topp sampling. The default value is 0, we'll consider to enable it by default in the future[#1732](https://github.com/vllm-project/vllm-ascend/pull/1732) +- A batch of bugs have been fixed for Data Parallelism case [#1273](https://github.com/vllm-project/vllm-ascend/pull/1273) [#1322](https://github.com/vllm-project/vllm-ascend/pull/1322) [#1275](https://github.com/vllm-project/vllm-ascend/pull/1275) [#1478](https://github.com/vllm-project/vllm-ascend/pull/1478) +- The DeepSeek performance has been improved. [#1194](https://github.com/vllm-project/vllm-ascend/pull/1194) [#1395](https://github.com/vllm-project/vllm-ascend/pull/1395) [#1380](https://github.com/vllm-project/vllm-ascend/pull/1380) +- Ascend scheduler works with prefix cache now. [#1446](https://github.com/vllm-project/vllm-ascend/pull/1446) +- DeepSeek now works with prefix cache now. [#1498](https://github.com/vllm-project/vllm-ascend/pull/1498) +- Support prompt logprobs to recover ceval accuracy in V1 [#1483](https://github.com/vllm-project/vllm-ascend/pull/1483) + +### Known Issues +- Pipeline parallelism is not working on ray in this version. It'll be supported in the next release. [#1751](https://github.com/vllm-project/vllm-ascend/issues/1751) + ## v0.9.1rc1 - 2025.06.22 This is the 1st release candidate of v0.9.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started.