diff --git a/docs/source/developer_guide/versioning_policy.md b/docs/source/developer_guide/versioning_policy.md
index 3ba5908..2e09801 100644
--- a/docs/source/developer_guide/versioning_policy.md
+++ b/docs/source/developer_guide/versioning_policy.md
@@ -5,7 +5,7 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
## vLLM Ascend Plugin versions
Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
-`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
+`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
- **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
- **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
@@ -13,15 +13,15 @@ Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.post
For example:
- `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
-- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
-- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
+- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
+- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
## Branch policy
vllm-ascend has main branch and dev branch.
- **main**: main branchοΌcorresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
-- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
+- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|--------------|--------------| --- | --- | --- |
+| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 |
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
## Release cadence
-### Next final release (`v0.7.x`) window
+### Next final release (`v0.7.3`) window
-| Date | Event |
-|------------|------------------------------------------------------------------|
-| March 2025 | Release candidates, v0.7.3rc1 |
-| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
+| Date | Event |
+|------------|-------------------------------------------|
+| 2025.03.14 | Release candidates, v0.7.3rc1 |
+| 2025.03.20 | Release candidates if needed, v0.7.3rc2 |
+| 2025.03.30 | Final release, v0.7.3 |
diff --git a/docs/source/faqs.md b/docs/source/faqs.md
index bb4ba5f..3f466b2 100644
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -3,6 +3,7 @@
## Version Specific FAQs
- [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
+- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
## General FAQs
diff --git a/docs/source/tutorials/multi_node.md b/docs/source/tutorials/multi_node.md
index 324f414..d674d8a 100644
--- a/docs/source/tutorials/multi_node.md
+++ b/docs/source/tutorials/multi_node.md
@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
```
+:::{note}
+If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
+:::
+
Start the vLLM server on head node:
```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
```
INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
-```
\ No newline at end of file
+```
diff --git a/docs/source/tutorials/single_npu.md b/docs/source/tutorials/single_npu.md
index 63e9331..445d951 100644
--- a/docs/source/tutorials/single_npu.md
+++ b/docs/source/tutorials/single_npu.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen 7B)
+# Single NPU (Qwen2.5 7B)
## Run vllm-ascend on Single NPU
diff --git a/docs/source/tutorials/single_npu_multimodal.md b/docs/source/tutorials/single_npu_multimodal.md
index c893090..3b01397 100644
--- a/docs/source/tutorials/single_npu_multimodal.md
+++ b/docs/source/tutorials/single_npu_multimodal.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen2.5-VL-7B)
+# Single NPU (Qwen2.5-VL 7B)
## Run vllm-ascend on Single NPU
diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md
index 28849f6..a1d70b2 100644
--- a/docs/source/user_guide/release_notes.md
+++ b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,33 @@
# Release note
+## v0.7.3rc1
+
+π Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
+- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
+- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
+
+### Highlights
+- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
+- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
+- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
+
+### Core
+- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
+- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
+
+### Model
+- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
+- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
+
+### Other
+- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
+- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
+- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
+
+### Known issues
+- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
+- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
+
## v0.7.1rc1
π Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
-Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
+Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
### Highlights
diff --git a/docs/source/user_guide/supported_models.md b/docs/source/user_guide/supported_models.md
index 48b9cee..06dc0cd 100644
--- a/docs/source/user_guide/supported_models.md
+++ b/docs/source/user_guide/supported_models.md
@@ -2,25 +2,27 @@
| Model | Supported | Note |
|---------|-----------|------|
-| Qwen 2.5 | β
||
+| DeepSeek v3 | β
|||
+| DeepSeek R1 | β
|||
+| DeepSeek Distill (Qwen/LLama) |β
||
+| Qwen2-VL | β
||
+| Qwen2-Audio | β
||
+| Qwen2.5 | β
||
+| Qwen2.5-VL | β
||
+| MiniCPM |β
| |
+| LLama3.1/3.2 | β
||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
-| DeepSeek v3 | β
|||
-| DeepSeek Distill (Qwen/llama) |β
||
-| LLama3.1/3.2 | β
||
| Gemma-2 | |Need test|
-| baichuan | |Need test|
-| minicpm | |Need test|
-| internlm | β
||
-| ChatGLM | β
||
-| InternVL 2.5 | β
||
-| Qwen2-VL | β
||
+| Baichuan | |Need test|
+| Internlm | β
||
+| ChatGLM | β | Plan in Q2|
+| InternVL2.5 | β
||
| GLM-4v | |Need test|
| Molomo | β
||
-| LLaVA 1.5 | β
||
+| LLaVA1.5 | | Need test|
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
-| Qwen2-Audio | β
||
diff --git a/docs/source/user_guide/suppoted_features.md b/docs/source/user_guide/suppoted_features.md
index b864ef6..94e8768 100644
--- a/docs/source/user_guide/suppoted_features.md
+++ b/docs/source/user_guide/suppoted_features.md
@@ -1,21 +1,21 @@
# Feature Support
-| Feature | Supported | Note |
-|---------|-----------|------|
-| Chunked Prefill | β | Plan in 2025 Q1 |
-| Automatic Prefix Caching | β
| Improve performance in 2025 Q2 |
-| LoRA | β | Plan in 2025 Q1 |
-| Prompt adapter | β | Plan in 2025 Q1 |
-| Speculative decoding | β | Plan in 2025 Q1 |
-| Pooling | β
| |
-| Enc-dec | β | Plan in 2025 Q2 |
-| Multi Modality | β
(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
-| LogProbs | β
||
-| Prompt logProbs | β
||
-| Async output | β
||
-| Multi step scheduler | β | Plan in 2025 Q1 |
-| Best of | β
||
-| Beam search | β
||
-| Guided Decoding | β
| Find more details at the [issue](https://github.com/vllm-project/vllm-ascend/issues/177) |
-| Tensor Parallel | β
| Only "mp" supported now |
-| Pipeline Parallel | β
| Only "mp" supported now |
+| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
+|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
+| Chunked Prefill | β | | | NA | Plan in 2025.03.30 |
+| Automatic Prefix Caching | β | | | NA | Plan in 2025.03.30 |
+| LoRA | β | | | NA | Plan in 2025.06.30 |
+| Prompt adapter | β | | | NA | Plan in 2025.06.30 |
+| Speculative decoding | β
| | | Basic functions available | Need fully test |
+| Pooling | β
| | | Basic functions available(Bert) | Need fully test and add more models support|
+| Enc-dec | β | | | NA | Plan in 2025.06.30|
+| Multi Modality | β
| | β
| Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
+| LogProbs | β
| | | Basic functions available | Need fully test |
+| Prompt logProbs | β
| | | Basic functions available | Need fully test |
+| Async output | β
| | | Basic functions available | Need fully test |
+| Multi step scheduler | β
| | | Basic functions available | Need fully test |
+| Best of | β
| | | Basic functions available | Need fully test |
+| Beam search | β
| | | Basic functions available | Need fully test |
+| Guided Decoding | β
| | | Basic functions available | Find more details at the [issue](https://github.com/vllm-project/vllm-ascend/issues/177) |
+| Tensor Parallel | β
| | | Basic functions available | Need fully test |
+| Pipeline Parallel | β
| | | Basic functions available | Need fully test |