[Doc] Add the release note for 0.7.3rc1 (#285)
Add the release note for 0.7.3rc1 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -5,7 +5,7 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt
|
||||
## vLLM Ascend Plugin versions
|
||||
|
||||
Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
|
||||
`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`)
|
||||
`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
|
||||
|
||||
- **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
|
||||
- **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
|
||||
@@ -13,15 +13,15 @@ Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.post
|
||||
|
||||
For example:
|
||||
- `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
|
||||
- `v0.7.1rc1`: will be the first pre version of vllm-ascend.
|
||||
- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors.
|
||||
- `v0.7.3rc1`: will be the first pre version of vllm-ascend.
|
||||
- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
|
||||
|
||||
## Branch policy
|
||||
|
||||
vllm-ascend has main branch and dev branch.
|
||||
|
||||
- **main**: main branch,corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI.
|
||||
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version.
|
||||
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
|
||||
|
||||
Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
|
||||
|
||||
@@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
|
||||
|
||||
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|
||||
|--------------|--------------| --- | --- | --- |
|
||||
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 |
|
||||
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
|
||||
|
||||
## Release cadence
|
||||
|
||||
### Next final release (`v0.7.x`) window
|
||||
### Next final release (`v0.7.3`) window
|
||||
|
||||
| Date | Event |
|
||||
|------------|------------------------------------------------------------------|
|
||||
| March 2025 | Release candidates, v0.7.3rc1 |
|
||||
| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 |
|
||||
| Date | Event |
|
||||
|------------|-------------------------------------------|
|
||||
| 2025.03.14 | Release candidates, v0.7.3rc1 |
|
||||
| 2025.03.20 | Release candidates if needed, v0.7.3rc2 |
|
||||
| 2025.03.30 | Final release, v0.7.3 |
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Version Specific FAQs
|
||||
|
||||
- [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19)
|
||||
- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267)
|
||||
|
||||
## General FAQs
|
||||
|
||||
|
||||
@@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
||||
ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip}
|
||||
```
|
||||
|
||||
:::{note}
|
||||
If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly.
|
||||
:::
|
||||
|
||||
Start the vLLM server on head node:
|
||||
|
||||
```shell
|
||||
@@ -106,4 +110,4 @@ Logs of the vllm server:
|
||||
```
|
||||
INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK
|
||||
INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
|
||||
```
|
||||
```
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Single NPU (Qwen 7B)
|
||||
# Single NPU (Qwen2.5 7B)
|
||||
|
||||
## Run vllm-ascend on Single NPU
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Single NPU (Qwen2.5-VL-7B)
|
||||
# Single NPU (Qwen2.5-VL 7B)
|
||||
|
||||
## Run vllm-ascend on Single NPU
|
||||
|
||||
|
||||
@@ -1,5 +1,33 @@
|
||||
# Release note
|
||||
|
||||
## v0.7.3rc1
|
||||
|
||||
🎉 Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
|
||||
- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
|
||||
- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
|
||||
|
||||
### Highlights
|
||||
- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242)
|
||||
- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252)
|
||||
- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300)
|
||||
|
||||
### Core
|
||||
- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy
|
||||
- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229)
|
||||
|
||||
### Model
|
||||
- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241)
|
||||
- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164)
|
||||
|
||||
### Other
|
||||
- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236)
|
||||
- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail
|
||||
- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807
|
||||
|
||||
### Known issues
|
||||
- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release.
|
||||
- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277)
|
||||
|
||||
## v0.7.1rc1
|
||||
|
||||
🎉 Hello, World!
|
||||
@@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend
|
||||
|
||||
vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
|
||||
|
||||
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
|
||||
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
|
||||
|
||||
### Highlights
|
||||
|
||||
|
||||
@@ -2,25 +2,27 @@
|
||||
|
||||
| Model | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Qwen 2.5 | ✅ ||
|
||||
| DeepSeek v3 | ✅|||
|
||||
| DeepSeek R1 | ✅|||
|
||||
| DeepSeek Distill (Qwen/LLama) |✅||
|
||||
| Qwen2-VL | ✅ ||
|
||||
| Qwen2-Audio | ✅ ||
|
||||
| Qwen2.5 | ✅ ||
|
||||
| Qwen2.5-VL | ✅ ||
|
||||
| MiniCPM |✅| |
|
||||
| LLama3.1/3.2 | ✅ ||
|
||||
| Mistral | | Need test |
|
||||
| DeepSeek v2.5 | |Need test |
|
||||
| DeepSeek v3 | ✅|||
|
||||
| DeepSeek Distill (Qwen/llama) |✅||
|
||||
| LLama3.1/3.2 | ✅ ||
|
||||
| Gemma-2 | |Need test|
|
||||
| baichuan | |Need test|
|
||||
| minicpm | |Need test|
|
||||
| internlm | ✅ ||
|
||||
| ChatGLM | ✅ ||
|
||||
| InternVL 2.5 | ✅ ||
|
||||
| Qwen2-VL | ✅ ||
|
||||
| Baichuan | |Need test|
|
||||
| Internlm | ✅ ||
|
||||
| ChatGLM | ❌ | Plan in Q2|
|
||||
| InternVL2.5 | ✅ ||
|
||||
| GLM-4v | |Need test|
|
||||
| Molomo | ✅ ||
|
||||
| LLaVA 1.5 | ✅ ||
|
||||
| LLaVA1.5 | | Need test|
|
||||
| Mllama | |Need test|
|
||||
| LLaVA-Next | |Need test|
|
||||
| LLaVA-Next-Video | |Need test|
|
||||
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
|
||||
| Ultravox | |Need test|
|
||||
| Qwen2-Audio | ✅ ||
|
||||
|
||||
@@ -1,21 +1,21 @@
|
||||
# Feature Support
|
||||
|
||||
| Feature | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
|
||||
| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q2 |
|
||||
| LoRA | ✗ | Plan in 2025 Q1 |
|
||||
| Prompt adapter | ✗ | Plan in 2025 Q1 |
|
||||
| Speculative decoding | ✗ | Plan in 2025 Q1 |
|
||||
| Pooling | ✅ | |
|
||||
| Enc-dec | ✗ | Plan in 2025 Q2 |
|
||||
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
|
||||
| LogProbs | ✅ ||
|
||||
| Prompt logProbs | ✅ ||
|
||||
| Async output | ✅ ||
|
||||
| Multi step scheduler | ✗ | Plan in 2025 Q1 |
|
||||
| Best of | ✅ ||
|
||||
| Beam search | ✅ ||
|
||||
| Guided Decoding | ✅ | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
|
||||
| Tensor Parallel | ✅ | Only "mp" supported now |
|
||||
| Pipeline Parallel | ✅ | Only "mp" supported now |
|
||||
| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
|
||||
|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
|
||||
| Chunked Prefill | ❌ | | | NA | Plan in 2025.03.30 |
|
||||
| Automatic Prefix Caching | ❌ | | | NA | Plan in 2025.03.30 |
|
||||
| LoRA | ❌ | | | NA | Plan in 2025.06.30 |
|
||||
| Prompt adapter | ❌ | | | NA | Plan in 2025.06.30 |
|
||||
| Speculative decoding | ✅ | | | Basic functions available | Need fully test |
|
||||
| Pooling | ✅ | | | Basic functions available(Bert) | Need fully test and add more models support|
|
||||
| Enc-dec | ❌ | | | NA | Plan in 2025.06.30|
|
||||
| Multi Modality | ✅ | | ✅ | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
|
||||
| LogProbs | ✅ | | | Basic functions available | Need fully test |
|
||||
| Prompt logProbs | ✅ | | | Basic functions available | Need fully test |
|
||||
| Async output | ✅ | | | Basic functions available | Need fully test |
|
||||
| Multi step scheduler | ✅ | | | Basic functions available | Need fully test |
|
||||
| Best of | ✅ | | | Basic functions available | Need fully test |
|
||||
| Beam search | ✅ | | | Basic functions available | Need fully test |
|
||||
| Guided Decoding | ✅ | | | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
|
||||
| Tensor Parallel | ✅ | | | Basic functions available | Need fully test |
|
||||
| Pipeline Parallel | ✅ | | | Basic functions available | Need fully test |
|
||||
|
||||
Reference in New Issue
Block a user