From c25631ec7b4c8d5ff4f23b9bf0b13954c81632e0 Mon Sep 17 00:00:00 2001
From: wangxiyuan <wangxiyuan1007@gmail.com>
Date: Thu, 13 Mar 2025 17:57:06 +0800
Subject: [PATCH] [Doc] Add the release note for 0.7.3rc1 (#285)

Add the release note for 0.7.3rc1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
---
 .../developer_guide/versioning_policy.md      | 20 +++++-----
 docs/source/faqs.md                           |  1 +
 docs/source/tutorials/multi_node.md           |  6 ++-
 docs/source/tutorials/single_npu.md           |  2 +-
 .../source/tutorials/single_npu_multimodal.md |  2 +-
 docs/source/user_guide/release_notes.md       | 30 ++++++++++++++-
 docs/source/user_guide/supported_models.md    | 26 +++++++------
 docs/source/user_guide/suppoted_features.md   | 38 +++++++++----------
 8 files changed, 81 insertions(+), 44 deletions(-)

diff --git a/docs/source/developer_guide/versioning_policy.md b/docs/source/developer_guide/versioning_policy.md
index 3ba5908..2e09801 100644
--- a/docs/source/developer_guide/versioning_policy.md
+++ b/docs/source/developer_guide/versioning_policy.md
@@ -5,7 +5,7 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
 ## vLLM Ascend Plugin versions
 
 Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
-`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
+`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
 
 - **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
 - **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
@@ -13,15 +13,15 @@ Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.post
 
 For example:
 - `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
-- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
-- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
+- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
+- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
 
 ## Branch policy
 
 vllm-ascend has main branch and dev branch.
 
 - **main**: main branch，corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
-- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
+- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
 
 Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
 
@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
 
 | vllm-ascend  | vLLM         | Python | Stable CANN | PyTorch/torch_npu |
 |--------------|--------------| --- | --- | --- |
+| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250308 |
 | v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250218 |
 
 ## Release cadence
 
-### Next final release (`v0.7.x`) window
+### Next final release (`v0.7.3`) window
 
-| Date       | Event                                                            |
-|------------|------------------------------------------------------------------|
-| March 2025 | Release candidates, v0.7.3rc1                                    |
-| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
+| Date       | Event                                     |
+|------------|-------------------------------------------|
+| 2025.03.14 | Release candidates, v0.7.3rc1             |
+| 2025.03.20 | Release candidates if needed, v0.7.3rc2   |
+| 2025.03.30 | Final release, v0.7.3                     |
diff --git a/docs/source/faqs.md b/docs/source/faqs.md
index bb4ba5f..3f466b2 100644
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -3,6 +3,7 @@
 ## Version Specific FAQs
 
 - [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
+- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
 
 ## General FAQs
 
diff --git a/docs/source/tutorials/multi_node.md b/docs/source/tutorials/multi_node.md
index 324f414..d674d8a 100644
--- a/docs/source/tutorials/multi_node.md
+++ b/docs/source/tutorials/multi_node.md
@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
 ```
 
+:::{note}
+If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
+:::
+
 Start the vLLM server on head node:
 
 ```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
 ```
 INFO:     127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
 INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
-```
\ No newline at end of file
+```
diff --git a/docs/source/tutorials/single_npu.md b/docs/source/tutorials/single_npu.md
index 63e9331..445d951 100644
--- a/docs/source/tutorials/single_npu.md
+++ b/docs/source/tutorials/single_npu.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen 7B)
+# Single NPU (Qwen2.5 7B)
 
 ## Run vllm-ascend on Single NPU
 
diff --git a/docs/source/tutorials/single_npu_multimodal.md b/docs/source/tutorials/single_npu_multimodal.md
index c893090..3b01397 100644
--- a/docs/source/tutorials/single_npu_multimodal.md
+++ b/docs/source/tutorials/single_npu_multimodal.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen2.5-VL-7B)
+# Single NPU (Qwen2.5-VL 7B)
 
 ## Run vllm-ascend on Single NPU
 
diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md
index 28849f6..a1d70b2 100644
--- a/docs/source/user_guide/release_notes.md
+++ b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,33 @@
 # Release note
 
+## v0.7.3rc1
+
+🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
+- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
+- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
+
+### Highlights
+- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
+- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
+- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
+
+### Core
+- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
+- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
+
+### Model
+- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
+- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
+
+### Other
+- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
+- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
+- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
+
+### Known issues
+- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
+- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
+
 ## v0.7.1rc1
 
 🎉 Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
 
 vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
 
-Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
+Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
 
 ### Highlights
 
diff --git a/docs/source/user_guide/supported_models.md b/docs/source/user_guide/supported_models.md
index 48b9cee..06dc0cd 100644
--- a/docs/source/user_guide/supported_models.md
+++ b/docs/source/user_guide/supported_models.md
@@ -2,25 +2,27 @@
 
 | Model | Supported | Note |
 |---------|-----------|------|
-| Qwen 2.5 | ✅ ||
+| DeepSeek v3 | ✅|||
+| DeepSeek R1 | ✅|||
+| DeepSeek Distill (Qwen/LLama) |✅||
+| Qwen2-VL | ✅ ||
+| Qwen2-Audio | ✅ ||
+| Qwen2.5 | ✅ ||
+| Qwen2.5-VL | ✅ ||
+| MiniCPM |✅| |
+| LLama3.1/3.2 | ✅ ||
 | Mistral |  | Need test |
 | DeepSeek v2.5 | |Need test |
-| DeepSeek v3 | ✅|||
-| DeepSeek Distill (Qwen/llama) |✅||
-| LLama3.1/3.2 | ✅ ||
 | Gemma-2 |  |Need test|
-| baichuan |  |Need test|
-| minicpm |  |Need test|
-| internlm | ✅ ||
-| ChatGLM | ✅ ||
-| InternVL 2.5 | ✅ ||
-| Qwen2-VL | ✅ ||
+| Baichuan |  |Need test|
+| Internlm | ✅ ||
+| ChatGLM | ❌ | Plan in Q2|
+| InternVL2.5 | ✅ ||
 | GLM-4v |  |Need test|
 | Molomo | ✅ ||
-| LLaVA 1.5 | ✅ ||
+| LLaVA1.5 | | Need test|
 | Mllama |  |Need test|
 | LLaVA-Next |  |Need test|
 | LLaVA-Next-Video |  |Need test|
 | Phi-3-Vison/Phi-3.5-Vison |  |Need test|
 | Ultravox |  |Need test|
-| Qwen2-Audio | ✅ ||
diff --git a/docs/source/user_guide/suppoted_features.md b/docs/source/user_guide/suppoted_features.md
index b864ef6..94e8768 100644
--- a/docs/source/user_guide/suppoted_features.md
+++ b/docs/source/user_guide/suppoted_features.md
@@ -1,21 +1,21 @@
 # Feature Support
 
-| Feature | Supported | Note |
-|---------|-----------|------|
-| Chunked Prefill | ✗ | Plan in 2025 Q1 |
-| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
-| LoRA | ✗ | Plan in 2025 Q1 |
-| Prompt adapter | ✗ | Plan in 2025 Q1 |
-| Speculative decoding | ✗ | Plan in 2025 Q1 |
-| Pooling | ✅ | |
-| Enc-dec | ✗ | Plan in 2025 Q2 |
-| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
-| LogProbs | ✅ ||
-| Prompt logProbs | ✅ ||
-| Async output | ✅ ||
-| Multi step scheduler | ✗ | Plan in 2025 Q1 |
-| Best of | ✅ ||
-| Beam search | ✅ ||
-| Guided Decoding | ✅ | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
-| Tensor Parallel | ✅ | Only "mp" supported now |
-| Pipeline Parallel | ✅ | Only "mp" supported now |
+|           Feature        | Supported | CI Coverage | Guidance Document |     Current Status        |    Next Step       |
+|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
+| Chunked Prefill          |     ❌    |             |                   |          NA               | Plan in 2025.03.30 |
+| Automatic Prefix Caching |     ❌    |             |                   |          NA               | Plan in 2025.03.30 |
+|          LoRA            |     ❌    |             |                   |          NA               | Plan in 2025.06.30 |
+|      Prompt adapter      |     ❌    |             |                   |          NA               | Plan in 2025.06.30 |
+|    Speculative decoding  |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|        Pooling           |     ✅    |             |                   | Basic functions available(Bert) | Need fully test and add more models support|
+|        Enc-dec           |     ❌    |             |                   |          NA               | Plan in 2025.06.30|
+|      Multi Modality      |     ✅    |             |         ✅        | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
+|        LogProbs          |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|     Prompt logProbs      |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|       Async output       |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|   Multi step scheduler   |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|          Best of         |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|        Beam search       |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|      Guided Decoding     |     ✅    |             |                   | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
+|      Tensor Parallel     |     ✅    |             |                   | Basic functions available |   Need fully test  |
+|     Pipeline Parallel    |     ✅    |             |                   | Basic functions available |   Need fully test  |