[Doc] Add the release note for 0.7.3rc1 (#285)

Add the release note for 0.7.3rc1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2025-03-13 17:57:06 +08:00
committed by GitHub
parent 41aba1cfc1
commit c25631ec7b
8 changed files with 81 additions and 44 deletions

View File

@@ -5,7 +5,7 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
## vLLM Ascend Plugin versions
Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
- **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
- **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
@@ -13,15 +13,15 @@ Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.post
For example:
- `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
## Branch policy
vllm-ascend has main branch and dev branch.
- **main**: main branchcorresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|--------------|--------------| --- | --- | --- |
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 |
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
## Release cadence
### Next final release (`v0.7.x`) window
### Next final release (`v0.7.3`) window
| Date | Event |
|------------|------------------------------------------------------------------|
| March 2025 | Release candidates, v0.7.3rc1 |
| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
| Date | Event |
|------------|-------------------------------------------|
| 2025.03.14 | Release candidates, v0.7.3rc1 |
| 2025.03.20 | Release candidates if needed, v0.7.3rc2 |
| 2025.03.30 | Final release, v0.7.3 |

View File

@@ -3,6 +3,7 @@
## Version Specific FAQs
- [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
## General FAQs

View File

@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
```
:::{note}
If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
:::
Start the vLLM server on head node:
```shell
@@ -106,4 +110,4 @@ Logs of the vllm server:
```
INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
```
```

View File

@@ -1,4 +1,4 @@
# Single NPU (Qwen 7B)
# Single NPU (Qwen2.5 7B)
## Run vllm-ascend on Single NPU

View File

@@ -1,4 +1,4 @@
# Single NPU (Qwen2.5-VL-7B)
# Single NPU (Qwen2.5-VL 7B)
## Run vllm-ascend on Single NPU

View File

@@ -1,5 +1,33 @@
# Release note
## v0.7.3rc1
🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
### Highlights
- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
### Core
- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
### Model
- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
### Other
- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
### Known issues
- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
## v0.7.1rc1
🎉 Hello, World!
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
### Highlights

View File

@@ -2,25 +2,27 @@
| Model | Supported | Note |
|---------|-----------|------|
| Qwen 2.5 | ✅ ||
| DeepSeek v3 | ✅|||
| DeepSeek R1 | ✅|||
| DeepSeek Distill (Qwen/LLama) |✅||
| Qwen2-VL | ✅ ||
| Qwen2-Audio | ✅ ||
| Qwen2.5 | ✅ ||
| Qwen2.5-VL | ✅ ||
| MiniCPM |✅| |
| LLama3.1/3.2 | ✅ ||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
| DeepSeek v3 | ✅|||
| DeepSeek Distill (Qwen/llama) |✅||
| LLama3.1/3.2 | ✅ ||
| Gemma-2 | |Need test|
| baichuan | |Need test|
| minicpm | |Need test|
| internlm | ||
| ChatGLM | ✅ ||
| InternVL 2.5 | ✅ ||
| Qwen2-VL | ✅ ||
| Baichuan | |Need test|
| Internlm | ||
| ChatGLM | | Plan in Q2|
| InternVL2.5 | ✅ ||
| GLM-4v | |Need test|
| Molomo | ✅ ||
| LLaVA 1.5 | ✅ ||
| LLaVA1.5 | | Need test|
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
| Qwen2-Audio | ✅ ||

View File

@@ -1,21 +1,21 @@
# Feature Support
| Feature | Supported | Note |
|---------|-----------|------|
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
| LoRA | ✗ | Plan in 2025 Q1 |
| Prompt adapter | ✗ | Plan in 2025 Q1 |
| Speculative decoding | ✗ | Plan in 2025 Q1 |
| Pooling | ✅ | |
| Enc-dec | ✗ | Plan in 2025 Q2 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
| LogProbs | ✅ ||
| Prompt logProbs | ✅ ||
| Async output | ✅ ||
| Multi step scheduler | ✗ | Plan in 2025 Q1 |
| Best of | ✅ ||
| Beam search | ✅ ||
| Guided Decoding | ✅ | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
| Tensor Parallel | ✅ | Only "mp" supported now |
| Pipeline Parallel | ✅ | Only "mp" supported now |
| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
| Chunked Prefill | ❌ | | | NA | Plan in 2025.03.30 |
| Automatic Prefix Caching | ❌ | | | NA | Plan in 2025.03.30 |
| LoRA | ❌ | | | NA | Plan in 2025.06.30 |
| Prompt adapter | ❌ | | | NA | Plan in 2025.06.30 |
| Speculative decoding | ✅ | | | Basic functions available | Need fully test |
| Pooling | ✅ | | | Basic functions available(Bert) | Need fully test and add more models support|
| Enc-dec | ❌ | | | NA | Plan in 2025.06.30|
| Multi Modality | ✅ | | ✅ | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
| LogProbs | ✅ | | | Basic functions available | Need fully test |
| Prompt logProbs | ✅ | | | Basic functions available | Need fully test |
| Async output | ✅ | | | Basic functions available | Need fully test |
| Multi step scheduler | ✅ | | | Basic functions available | Need fully test |
| Best of | ✅ | | | Basic functions available | Need fully test |
| Beam search | ✅ | | | Basic functions available | Need fully test |
| Guided Decoding | ✅ | | | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
| Tensor Parallel | ✅ | | | Basic functions available | Need fully test |
| Pipeline Parallel | ✅ | | | Basic functions available | Need fully test |