From c25631ec7b4c8d5ff4f23b9bf0b13954c81632e0 Mon Sep 17 00:00:00 2001 From: wangxiyuan Date: Thu, 13 Mar 2025 17:57:06 +0800 Subject: [PATCH] [Doc] Add the release note for 0.7.3rc1 (#285) Add the release note for 0.7.3rc1 Signed-off-by: wangxiyuan --- .../developer_guide/versioning_policy.md | 20 +++++----- docs/source/faqs.md | 1 + docs/source/tutorials/multi_node.md | 6 ++- docs/source/tutorials/single_npu.md | 2 +- .../source/tutorials/single_npu_multimodal.md | 2 +- docs/source/user_guide/release_notes.md | 30 ++++++++++++++- docs/source/user_guide/supported_models.md | 26 +++++++------ docs/source/user_guide/suppoted_features.md | 38 +++++++++---------- 8 files changed, 81 insertions(+), 44 deletions(-) diff --git a/docs/source/developer_guide/versioning_policy.md b/docs/source/developer_guide/versioning_policy.md index 3ba5908..2e09801 100644 --- a/docs/source/developer_guide/versioning_policy.md +++ b/docs/source/developer_guide/versioning_policy.md @@ -5,7 +5,7 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt ## vLLM Ascend Plugin versions Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as -`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`) +`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`) - **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration. - **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release. @@ -13,15 +13,15 @@ Each vllm-ascend release will be versioned: `v[major].[minor].[micro][rcN][.post For example: - `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version. -- `v0.7.1rc1`: will be the first pre version of vllm-ascend. -- `v0.7.1.post1`: will be the post release if the `v0.7.1` release has some minor errors. +- `v0.7.3rc1`: will be the first pre version of vllm-ascend. +- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors. ## Branch policy vllm-ascend has main branch and dev branch. - **main**: main branch,corresponds to the vLLM main branch, and is continuously monitored for quality through Ascend CI. -- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.1-dev` is the dev branch for vLLM `v0.7.1` version. +- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version. Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible. @@ -67,13 +67,15 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin: | vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | |--------------|--------------| --- | --- | --- | +| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 | | v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 | ## Release cadence -### Next final release (`v0.7.x`) window +### Next final release (`v0.7.3`) window -| Date | Event | -|------------|------------------------------------------------------------------| -| March 2025 | Release candidates, v0.7.3rc1 | -| March 2025 | Final release passes, match vLLM v0.7.x latest: v0.7.1 or v0.7.3 | +| Date | Event | +|------------|-------------------------------------------| +| 2025.03.14 | Release candidates, v0.7.3rc1 | +| 2025.03.20 | Release candidates if needed, v0.7.3rc2 | +| 2025.03.30 | Final release, v0.7.3 | diff --git a/docs/source/faqs.md b/docs/source/faqs.md index bb4ba5f..3f466b2 100644 --- a/docs/source/faqs.md +++ b/docs/source/faqs.md @@ -3,6 +3,7 @@ ## Version Specific FAQs - [[v0.7.1rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/19) +- [[v0.7.3rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/267) ## General FAQs diff --git a/docs/source/tutorials/multi_node.md b/docs/source/tutorials/multi_node.md index 324f414..d674d8a 100644 --- a/docs/source/tutorials/multi_node.md +++ b/docs/source/tutorials/multi_node.md @@ -55,6 +55,10 @@ export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ray start --address='{head_node_ip}:{port_num}' --num-gpus=8 --node-ip-address={local_ip} ``` +:::{note} +If you're running DeepSeek V3/R1, please remove `quantization_config` section in `config.json` file since it's not supported by vllm-ascend currentlly. +::: + Start the vLLM server on head node: ```shell @@ -106,4 +110,4 @@ Logs of the vllm server: ``` INFO: 127.0.0.1:59384 - "POST /v1/completions HTTP/1.1" 200 OK INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. -``` \ No newline at end of file +``` diff --git a/docs/source/tutorials/single_npu.md b/docs/source/tutorials/single_npu.md index 63e9331..445d951 100644 --- a/docs/source/tutorials/single_npu.md +++ b/docs/source/tutorials/single_npu.md @@ -1,4 +1,4 @@ -# Single NPU (Qwen 7B) +# Single NPU (Qwen2.5 7B) ## Run vllm-ascend on Single NPU diff --git a/docs/source/tutorials/single_npu_multimodal.md b/docs/source/tutorials/single_npu_multimodal.md index c893090..3b01397 100644 --- a/docs/source/tutorials/single_npu_multimodal.md +++ b/docs/source/tutorials/single_npu_multimodal.md @@ -1,4 +1,4 @@ -# Single NPU (Qwen2.5-VL-7B) +# Single NPU (Qwen2.5-VL 7B) ## Run vllm-ascend on Single NPU diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md index 28849f6..a1d70b2 100644 --- a/docs/source/user_guide/release_notes.md +++ b/docs/source/user_guide/release_notes.md @@ -1,5 +1,33 @@ # Release note +## v0.7.3rc1 + +πŸŽ‰ Hello, World! This is the first release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey. +- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html +- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html + +### Highlights +- DeepSeek V3/R1 works well now. Read the [official guide](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_node.html) to start! [#242](https://github.com/vllm-project/vllm-ascend/pull/242) +- Speculative decoding feature is supported. [#252](https://github.com/vllm-project/vllm-ascend/pull/252) +- Multi step scheduler feature is supported. [#300](https://github.com/vllm-project/vllm-ascend/pull/300) + +### Core +- Bump torch_npu version to dev20250308.3 to improve `_exponential` accuracy +- Added initial support for pooling models. Bert based model, such as `BAAI/bge-base-en-v1.5` and `BAAI/bge-reranker-v2-m3` works now. [#229](https://github.com/vllm-project/vllm-ascend/pull/229) + +### Model +- The performance of Qwen2-VL is improved. [#241](https://github.com/vllm-project/vllm-ascend/pull/241) +- MiniCPM is now supported [#164](https://github.com/vllm-project/vllm-ascend/pull/164) + +### Other +- Support MTP(Multi-Token Prediction) for DeepSeek V3/R1 [#236](https://github.com/vllm-project/vllm-ascend/pull/236) +- [Docs] Added more model tutorials, include DeepSeek, QwQ, Qwen and Qwen 2.5VL. See the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/index.html) for detail +- Pin modelscope<1.23.0 on vLLM v0.7.3 to resolve: https://github.com/vllm-project/vllm/pull/13807 + +### Known issues +- In [some cases](https://github.com/vllm-project/vllm-ascend/issues/324), expecially when the input/output is very long, the accuracy of output may be incorrect. We are working on it. It'll be fixed in the next release. +- Improved and reduced the garbled code in model output. But if you still hit the issue, try to change the gerneration config value, such as `temperature`, and try again. There is also a knonwn issue shown below. Any [feedback](https://github.com/vllm-project/vllm-ascend/issues/267) is welcome. [#277](https://github.com/vllm-project/vllm-ascend/pull/277) + ## v0.7.1rc1 πŸŽ‰ Hello, World! @@ -8,7 +36,7 @@ We are excited to announce the first release candidate of v0.7.1 for vllm-ascend vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU. -Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19) +Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1-dev) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19) ### Highlights diff --git a/docs/source/user_guide/supported_models.md b/docs/source/user_guide/supported_models.md index 48b9cee..06dc0cd 100644 --- a/docs/source/user_guide/supported_models.md +++ b/docs/source/user_guide/supported_models.md @@ -2,25 +2,27 @@ | Model | Supported | Note | |---------|-----------|------| -| Qwen 2.5 | βœ… || +| DeepSeek v3 | βœ…||| +| DeepSeek R1 | βœ…||| +| DeepSeek Distill (Qwen/LLama) |βœ…|| +| Qwen2-VL | βœ… || +| Qwen2-Audio | βœ… || +| Qwen2.5 | βœ… || +| Qwen2.5-VL | βœ… || +| MiniCPM |βœ…| | +| LLama3.1/3.2 | βœ… || | Mistral | | Need test | | DeepSeek v2.5 | |Need test | -| DeepSeek v3 | βœ…||| -| DeepSeek Distill (Qwen/llama) |βœ…|| -| LLama3.1/3.2 | βœ… || | Gemma-2 | |Need test| -| baichuan | |Need test| -| minicpm | |Need test| -| internlm | βœ… || -| ChatGLM | βœ… || -| InternVL 2.5 | βœ… || -| Qwen2-VL | βœ… || +| Baichuan | |Need test| +| Internlm | βœ… || +| ChatGLM | ❌ | Plan in Q2| +| InternVL2.5 | βœ… || | GLM-4v | |Need test| | Molomo | βœ… || -| LLaVA 1.5 | βœ… || +| LLaVA1.5 | | Need test| | Mllama | |Need test| | LLaVA-Next | |Need test| | LLaVA-Next-Video | |Need test| | Phi-3-Vison/Phi-3.5-Vison | |Need test| | Ultravox | |Need test| -| Qwen2-Audio | βœ… || diff --git a/docs/source/user_guide/suppoted_features.md b/docs/source/user_guide/suppoted_features.md index b864ef6..94e8768 100644 --- a/docs/source/user_guide/suppoted_features.md +++ b/docs/source/user_guide/suppoted_features.md @@ -1,21 +1,21 @@ # Feature Support -| Feature | Supported | Note | -|---------|-----------|------| -| Chunked Prefill | βœ— | Plan in 2025 Q1 | -| Automatic Prefix Caching | βœ… | Improve performance in 2025 Q2 | -| LoRA | βœ— | Plan in 2025 Q1 | -| Prompt adapter | βœ— | Plan in 2025 Q1 | -| Speculative decoding | βœ— | Plan in 2025 Q1 | -| Pooling | βœ… | | -| Enc-dec | βœ— | Plan in 2025 Q2 | -| Multi Modality | βœ… (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 | -| LogProbs | βœ… || -| Prompt logProbs | βœ… || -| Async output | βœ… || -| Multi step scheduler | βœ— | Plan in 2025 Q1 | -| Best of | βœ… || -| Beam search | βœ… || -| Guided Decoding | βœ… | Find more details at the [issue](https://github.com/vllm-project/vllm-ascend/issues/177) | -| Tensor Parallel | βœ… | Only "mp" supported now | -| Pipeline Parallel | βœ… | Only "mp" supported now | +| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step | +|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------| +| Chunked Prefill | ❌ | | | NA | Plan in 2025.03.30 | +| Automatic Prefix Caching | ❌ | | | NA | Plan in 2025.03.30 | +| LoRA | ❌ | | | NA | Plan in 2025.06.30 | +| Prompt adapter | ❌ | | | NA | Plan in 2025.06.30 | +| Speculative decoding | βœ… | | | Basic functions available | Need fully test | +| Pooling | βœ… | | | Basic functions available(Bert) | Need fully test and add more models support| +| Enc-dec | ❌ | | | NA | Plan in 2025.06.30| +| Multi Modality | βœ… | | βœ… | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support | +| LogProbs | βœ… | | | Basic functions available | Need fully test | +| Prompt logProbs | βœ… | | | Basic functions available | Need fully test | +| Async output | βœ… | | | Basic functions available | Need fully test | +| Multi step scheduler | βœ… | | | Basic functions available | Need fully test | +| Best of | βœ… | | | Basic functions available | Need fully test | +| Beam search | βœ… | | | Basic functions available | Need fully test | +| Guided Decoding | βœ… | | | Basic functions available | Find more details at the [issue](https://github.com/vllm-project/vllm-ascend/issues/177) | +| Tensor Parallel | βœ… | | | Basic functions available | Need fully test | +| Pipeline Parallel | βœ… | | | Basic functions available | Need fully test |