From adee9dd3b182cb22a4f609bea7ddce4e44192a17 Mon Sep 17 00:00:00 2001 From: lilinsiman Date: Thu, 13 Nov 2025 15:53:58 +0800 Subject: [PATCH] [Info][main] Correct the mistake in information documents (#4157) ### What this PR does / why we need it? Correct the mistake in information documents ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379 --------- Signed-off-by: lilinsiman --- docs/source/community/contributors.md | 2 +- docs/source/community/user_stories/index.md | 2 +- docs/source/community/versioning_policy.md | 2 +- docs/source/faqs.md | 5 +++-- .../LC_MESSAGES/user_guide/feature_guide/graph_mode.po | 2 +- docs/source/tutorials/single_npu_qwen2.5_vl.md | 2 ++ docs/source/user_guide/feature_guide/graph_mode.md | 10 +++++----- docs/source/user_guide/release_notes.md | 2 +- .../user_guide/support_matrix/supported_models.md | 2 +- 9 files changed, 16 insertions(+), 13 deletions(-) diff --git a/docs/source/community/contributors.md b/docs/source/community/contributors.md index 5fcaa05d..84985be1 100644 --- a/docs/source/community/contributors.md +++ b/docs/source/community/contributors.md @@ -1,4 +1,4 @@ -# Maintainers and contributors +# Maintainers and Contributors ## Maintainers diff --git a/docs/source/community/user_stories/index.md b/docs/source/community/user_stories/index.md index e9b3edd9..7fdf62a0 100644 --- a/docs/source/community/user_stories/index.md +++ b/docs/source/community/user_stories/index.md @@ -1,4 +1,4 @@ -# User stories +# User Stories Read case studies on how users and developers solve real, everyday problems with vLLM Ascend diff --git a/docs/source/community/versioning_policy.md b/docs/source/community/versioning_policy.md index f4ee66df..afc09212 100644 --- a/docs/source/community/versioning_policy.md +++ b/docs/source/community/versioning_policy.md @@ -1,4 +1,4 @@ -# Versioning policy +# Versioning Policy Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend)) project follows the [PEP 440](https://peps.python.org/pep-0440/) to publish matching with vLLM ([vllm-project/vllm](https://github.com/vllm-project/vllm)). diff --git a/docs/source/faqs.md b/docs/source/faqs.md index b4987918..aea56b26 100644 --- a/docs/source/faqs.md +++ b/docs/source/faqs.md @@ -15,7 +15,8 @@ Currently, **ONLY** Atlas A2 series(Ascend-cann-kernels-910b),Atlas A3 series( - Atlas 800I A2 Inference series (Atlas 800I A2) - Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD) - Atlas 800I A3 Inference series (Atlas 800I A3) -- [Experimental] Atlas 300I Inference series (Atlas 300I Duo). Currently for 310I Duo the stable version is vllm-ascend v0.10.0rc1. +- [Experimental] Atlas 300I Inference series (Atlas 300I Duo). +- [Experimental] Currently for 310I Duo the stable version is vllm-ascend v0.10.0rc1. Below series are NOT supported yet: - Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet @@ -135,7 +136,7 @@ OOM errors typically occur when the model exceeds the memory capacity of a singl In scenarios where NPUs have limited high bandwidth memory (HBM) capacity, dynamic memory allocation/deallocation during inference can exacerbate memory fragmentation, leading to OOM. To address this: -- **Limit --max-model-len**: It can save the HBM usage for kv cache initialization step. +- **Limit `--max-model-len`**: It can save the HBM usage for kv cache initialization step. - **Adjust `--gpu-memory-utilization`**: If unspecified, the default value is `0.9`. You can decrease this value to reserve more memory to reduce fragmentation risks. See details in: [vLLM - Inference and Serving - Engine Arguments](https://docs.vllm.ai/en/latest/serving/engine_args.html#vllm.engine.arg_utils-_engine_args_parser-cacheconfig). diff --git a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/graph_mode.po b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/graph_mode.po index 9680bdbe..b2336a4e 100644 --- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/graph_mode.po +++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/graph_mode.po @@ -61,7 +61,7 @@ msgid "" "**ACLGraph**: This is the default graph mode supported by vLLM Ascend. In " "v0.9.1rc1, only Qwen series models are well tested." msgstr "" -"**ACLGraph**:这是 vLLM Ascend 支持的默认图模式。在 v0.9.1rc1 版本中,只有 Qwen 系列模型得到了充分测试。" +"**ACLGraph**:这是 vLLM Ascend 支持的默认图模式。在 v0.9.1rc1 版本中,Qwen 和Deepseek系列模型得到了充分测试。" #: ../../user_guide/feature_guide/graph_mode.md:15 msgid "" diff --git a/docs/source/tutorials/single_npu_qwen2.5_vl.md b/docs/source/tutorials/single_npu_qwen2.5_vl.md index 2454e0c7..f4a335e7 100644 --- a/docs/source/tutorials/single_npu_qwen2.5_vl.md +++ b/docs/source/tutorials/single_npu_qwen2.5_vl.md @@ -47,6 +47,8 @@ Run the following script to execute offline inference on a single NPU: pip install qwen_vl_utils --extra-index-url https://download.pytorch.org/whl/cpu/ ``` +Create a python script with the following content: + ```python from transformers import AutoProcessor from vllm import LLM, SamplingParams diff --git a/docs/source/user_guide/feature_guide/graph_mode.md b/docs/source/user_guide/feature_guide/graph_mode.md index 90aba6a3..43236289 100644 --- a/docs/source/user_guide/feature_guide/graph_mode.md +++ b/docs/source/user_guide/feature_guide/graph_mode.md @@ -11,7 +11,7 @@ This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. P From v0.9.1rc1 with V1 Engine, vLLM Ascend will run models in graph mode by default to keep the same behavior with vLLM. If you hit any issues, please feel free to open an issue on GitHub and fallback to the eager mode temporarily by setting `enforce_eager=True` when initializing the model. There are two kinds for graph mode supported by vLLM Ascend: -- **ACLGraph**: This is the default graph mode supported by vLLM Ascend. In v0.9.1rc1, only Qwen series models are well tested. +- **ACLGraph**: This is the default graph mode supported by vLLM Ascend. In v0.9.1rc1, Qwen and Deepseek series models are well tested. - **TorchAirGraph**: This is the GE graph mode. In v0.9.1rc1, only DeepSeek series models are supported. ## Using ACLGraph @@ -24,7 +24,7 @@ import os from vllm import LLM -model = LLM(model="Qwen/Qwen2-7B-Instruct") +model = LLM(model="path/to/Qwen2-7B-Instruct") outputs = model.generate("Hello, how are you?") ``` @@ -44,15 +44,15 @@ Offline example: import os from vllm import LLM -# TorchAirGraph is only work without chunked-prefill now -model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True}}) +# TorchAirGraph only works without chunked-prefill now +model = LLM(model="path/to/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True}}) outputs = model.generate("Hello, how are you?") ``` Online example: ```shell -vllm serve deepseek-ai/DeepSeek-R1-0528 --additional-config='{"torchair_graph_config": {"enabled": true},"ascend_scheduler_config": {"enabled": true}}' +vllm serve path/to/DeepSeek-R1-0528 --additional-config='{"torchair_graph_config": {"enabled": true},"ascend_scheduler_config": {"enabled": true}}' ``` You can find more details about additional configuration [here](../configuration/additional_config.md). diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md index e792e31e..f1e2792c 100644 --- a/docs/source/user_guide/release_notes.md +++ b/docs/source/user_guide/release_notes.md @@ -136,7 +136,7 @@ This is the 1st release candidate of v0.10.1 for vLLM Ascend. Please follow the * Added `lmhead_tensor_parallel_size` in `additional_config`, set it to enable lmhead tensor parallel. [#2309](https://github.com/vllm-project/vllm-ascend/pull/2309) * Some unused environment variables `HCCN_PATH`, `PROMPT_DEVICE_ID`, `DECODE_DEVICE_ID`, `LLMDATADIST_COMM_PORT` and `LLMDATADIST_SYNC_CACHE_WAIT_TIME` are removed. [#2448](https://github.com/vllm-project/vllm-ascend/pull/2448) * Environment variable `VLLM_LLMDD_RPC_PORT` is renamed to `VLLM_ASCEND_LLMDD_RPC_PORT` now. [#2450](https://github.com/vllm-project/vllm-ascend/pull/2450) - * Added `VLLM_ASCEND_ENABLE_MLP_OPTIMIZE` in environment variables, whether to enable mlp optimize when tensor parallel is enabled. This feature provides better performance in eager mode. [#2120](https://github.com/vllm-project/vllm-ascend/pull/2120) + * Added `VLLM_ASCEND_ENABLE_MLP_OPTIMIZE` in environment variables, Whether to enable mlp optimize when tensor parallel is enabled. This feature provides better performance in eager mode. [#2120](https://github.com/vllm-project/vllm-ascend/pull/2120) * Removed `MOE_ALL2ALL_BUFFER` and `VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ` in environment variables. [#2612](https://github.com/vllm-project/vllm-ascend/pull/2612) * Added `enable_prefetch` in `additional_config`, Whether to enable weight prefetch. [#2465](https://github.com/vllm-project/vllm-ascend/pull/2465) * Added `mode` in `additional_config.torchair_graph_config`, When using reduce-overhead mode for torchair, mode needs to be set. [#2461](https://github.com/vllm-project/vllm-ascend/pull/2461) diff --git a/docs/source/user_guide/support_matrix/supported_models.md b/docs/source/user_guide/support_matrix/supported_models.md index c72bcdb8..a9260120 100644 --- a/docs/source/user_guide/support_matrix/supported_models.md +++ b/docs/source/user_guide/support_matrix/supported_models.md @@ -71,7 +71,7 @@ Get the latest info here: https://github.com/vllm-project/vllm-ascend/issues/160 | LLaVA-Next-Video | ✅ | ||||||||||||||||||| | MiniCPM-V | ✅ | ||||||||||||||||||| | Mistral3 | ✅ | ||||||||||||||||||| -| Phi-3-Vison/Phi-3.5-Vison | ✅ | ||||||||||||||||||| +| Phi-3-Vision/Phi-3.5-Vision | ✅ | ||||||||||||||||||| | Gemma3 | ✅ | ||||||||||||||||||| | Llama4 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) ||||||||||||||||||| | Llama3.2 | ❌ | [1972](https://github.com/vllm-project/vllm-ascend/issues/1972) |||||||||||||||||||