[Doc] Add the release note for 0.7.3rc1 (#285)

Add the release note for 0.7.3rc1 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-13 17:57:06 +08:00
parent 41aba1cfc1
commit c25631ec7b
8 changed files with 81 additions and 44 deletions
--- a/docs/source/developer_guide/versioning_policy.md
+++ b/docs/source/developer_guide/versioning_policy.md
@@ -5,7 +5,7 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
 ## vLLM Ascend Plugin versions

 Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
-`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
+`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)

 - **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
 - **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
@@ -13,15 +13,15 @@ Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.post

 For example:
 - `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
+- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
+- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.

 ## Branch policy

 vllm-ascend has main branch and dev branch.

 - **main**: main branch，corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
+- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.

 Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.

@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:

 | vllm-ascend  | vLLM         | Python | Stable CANN | PyTorch/torch_npu |
 |--------------|--------------| --- | --- | --- |
+| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250308 |
 | v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250218 |

 ## Release cadence

-### Next final release (`v0.7.x`) window
+### Next final release (`v0.7.3`) window

-| Date       | Event                                                            |
-|------------|------------------------------------------------------------------|
-| March 2025 | Release candidates, v0.7.3rc1                                    |
-| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
+| Date       | Event                                     |
+|------------|-------------------------------------------|
+| 2025.03.14 | Release candidates, v0.7.3rc1             |
+| 2025.03.20 | Release candidates if needed, v0.7.3rc2   |
+| 2025.03.30 | Final release, v0.7.3                     |
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -3,6 +3,7 @@
 ## Version Specific FAQs

 - [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
+- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)

 ## General FAQs

--- a/docs/source/tutorials/multi_node.md
+++ b/docs/source/tutorials/multi_node.md
@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
 ```

+:::{note}
+If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
+:::
+
 Start the vLLM server on head node:

 ```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
 ```
 INFO:     127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
 INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
-```
+```
--- a/docs/source/tutorials/single_npu.md
+++ b/docs/source/tutorials/single_npu.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen 7B)
+# Single NPU (Qwen2.5 7B)

 ## Run vllm-ascend on Single NPU

--- a/docs/source/tutorials/single_npu_multimodal.md
+++ b/docs/source/tutorials/single_npu_multimodal.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen2.5-VL-7B)
+# Single NPU (Qwen2.5-VL 7B)

 ## Run vllm-ascend on Single NPU

--- a/docs/source/user_guide/release_notes.md
+++ b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,33 @@
 # Release note

+## v0.7.3rc1
+
+🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
+- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
+- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
+
+### Highlights
+- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
+- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
+- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
+
+### Core
+- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
+- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
+
+### Model
+- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
+- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
+
+### Other
+- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
+- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
+- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
+
+### Known issues
+- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
+- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
+
 ## v0.7.1rc1

 🎉 Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend

 vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.

-Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
+Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)

 ### Highlights

--- a/docs/source/user_guide/supported_models.md
+++ b/docs/source/user_guide/supported_models.md
@@ -2,25 +2,27 @@

 | Model | Supported | Note |
 |---------|-----------|------|
-| Qwen 2.5 | ✅ ||
+| DeepSeek v3 | ✅|||
+| DeepSeek R1 | ✅|||
+| DeepSeek Distill (Qwen/LLama) |✅||
+| Qwen2-VL | ✅ ||
+| Qwen2-Audio | ✅ ||
+| Qwen2.5 | ✅ ||
+| Qwen2.5-VL | ✅ ||
+| MiniCPM |✅| |
+| LLama3.1/3.2 | ✅ ||
 | Mistral |  | Need test |
 | DeepSeek v2.5 | |Need test |
-| DeepSeek v3 | ✅|||
-| DeepSeek Distill (Qwen/llama) |✅||
-| LLama3.1/3.2 | ✅ ||
 | Gemma-2 |  |Need test|
-| baichuan |  |Need test|
-| minicpm |  |Need test|
-| internlm | ✅ ||
-| ChatGLM | ✅ ||
-| InternVL 2.5 | ✅ ||
-| Qwen2-VL | ✅ ||
+| Baichuan |  |Need test|
+| Internlm | ✅ ||
+| ChatGLM | ❌ | Plan in Q2|
+| InternVL2.5 | ✅ ||
 | GLM-4v |  |Need test|
 | Molomo | ✅ ||
-| LLaVA 1.5 | ✅ ||
+| LLaVA1.5 | | Need test|
 | Mllama |  |Need test|
 | LLaVA-Next |  |Need test|
 | LLaVA-Next-Video |  |Need test|
 | Phi-3-Vison/Phi-3.5-Vison |  |Need test|
 | Ultravox |  |Need test|
-| Qwen2-Audio | ✅ ||
--- a/docs/source/user_guide/suppoted_features.md
+++ b/docs/source/user_guide/suppoted_features.md
@@ -1,21 +1,21 @@
 # Feature Support

-| Feature | Supported | Note |
-|---------|-----------|------|
-| Chunked Prefill | ✗ | Plan in 2025 Q1 |
-| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
-| LoRA | ✗ | Plan in 2025 Q1 |
-| Prompt adapter | ✗ | Plan in 2025 Q1 |
-| Speculative decoding | ✗ | Plan in 2025 Q1 |
-| Pooling | ✅ | |
-| Enc-dec | ✗ | Plan in 2025 Q2 |
-| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
-| LogProbs | ✅ ||
-| Prompt logProbs | ✅ ||
-| Async output | ✅ ||
-| Multi step scheduler | ✗ | Plan in 2025 Q1 |
-| Best of | ✅ ||
-| Beam search | ✅ ||
-| Guided Decoding | ✅ | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
-| Tensor Parallel | ✅ | Only "mp" supported now |
-| Pipeline Parallel | ✅ | Only "mp" supported now |
+|           Feature        | Supported | CI Coverage | Guidance Document |     Current Status        |    Next Step       |
+|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
+| Chunked Prefill          |     ❌    |             |                   |          NA               | Plan in 2025.03.30 |
+| Automatic Prefix Caching |     ❌    |             |                   |          NA               | Plan in 2025.03.30 |
+|          LoRA            |     ❌    |             |                   |          NA               | Plan in 2025.06.30 |
+|      Prompt adapter      |     ❌    |             |                   |          NA               | Plan in 2025.06.30 |
+|    Speculative decoding  |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|        Pooling           |     ✅    |             |                   | Basic functions available(Bert) | Need fully test and add more models support|
+|        Enc-dec           |     ❌    |             |                   |          NA               | Plan in 2025.06.30|
+|      Multi Modality      |     ✅    |             |         ✅        | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
+|        LogProbs          |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|     Prompt logProbs      |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|       Async output       |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|   Multi step scheduler   |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|          Best of         |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|        Beam search       |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|      Guided Decoding     |     ✅    |             |                   | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
+|      Tensor Parallel     |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|     Pipeline Parallel    |     ✅    |             |                   | Basic functions available |   Need fully test  |