[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-10-29 11:32:12 +08:00
parent 6188450269
commit 75de3fa172
49 changed files with 724 additions and 701 deletions
--- a/docs/source/community/contributors.md
+++ b/docs/source/community/contributors.md
@@ -17,7 +17,7 @@

 ## Contributors

-vLLM Ascend every release would not have been possible without the following contributors:
+Every release of vLLM Ascend would not have been possible without the following contributors:

 Updated on 2025-09-30:

--- a/docs/source/community/governance.md
+++ b/docs/source/community/governance.md
@@ -1,48 +1,48 @@
 # Governance

 ## Mission
-As a vital component of vLLM, the vLLM Ascend project is dedicated to providing an easy, fast, and cheap LLM Serving for Everyone on Ascend NPU, and to actively contribute to the enrichment of vLLM.
+As a vital component of vLLM, the vLLM Ascend project is dedicated to providing an easy, fast, and cheap LLM Serving for everyone on Ascend NPUs and to actively contributing to the enrichment of vLLM.

 ## Principles
-vLLM Ascend follows the vLLM community's code of conduct：[vLLM - CODE OF CONDUCT](https://github.com/vllm-project/vllm/blob/main/CODE_OF_CONDUCT.md)
+vLLM Ascend follows the vLLM community's code of conduct: [vLLM - CODE OF CONDUCT](https://github.com/vllm-project/vllm/blob/main/CODE_OF_CONDUCT.md)

 ## Governance - Mechanics
 vLLM Ascend is an open-source project under the vLLM community, where the authority to appoint roles is ultimately determined by the vLLM community. It adopts a hierarchical technical governance structure.

 - Contributor:

-    **Responsibility:** Help new contributors on boarding, handle and respond to community questions, review RFCs, code
+    **Responsibility:** Help new contributors on boarding, handle and respond to community questions, review RFCs and code.

-    **Requirements:** Complete at least 1 contribution. Contributor is someone who consistently and actively participates in a project, included but not limited to issue/review/commits/community involvement.
+    **Requirements:** Complete at least 1 contribution. A contributor is someone who consistently and actively participates in a project, including but not limited to issue/review/commits/community involvement.

-    Contributors will be empowered [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) Github repo `Triage` permissions (`Can read and clone this repository. Can also manage issues and pull requests`) to help community developers collaborate more efficiently.
+    The contributor permissions are granted by the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend)'s repo `Triage` on GitHub, including repo read and clone, issue and PR management, facilitating efficient collaboration between community developers.

 - Maintainer:

-    **Responsibility:** Develop the project's vision and mission. Maintainers are responsible for driving the technical direction of the entire project and ensuring its overall success, possessing code merge permissions. They formulate the roadmap, review contributions from community members, continuously contribute code, and actively engage in community activities (such as regular meetings/events).
+    **Responsibility:** Develop the project's vision and mission. Maintainers are responsible for shaping the technical direction of the project and ensuring its long-term success. With code merge permissions, they lead roadmap planning, review community contributions, make ongoing code improvements, and actively participate in community engagement—such as regular meetings and events.

-    **Requirements:** Deep understanding of ‌vLLM‌ and ‌vLLM Ascend‌ codebases, with a commitment to sustained code contributions. Competency in ‌design/development/PR review workflows‌.
-    - **Review Quality‌:** Actively participate in community code reviews, ensuring high-quality code integration.
-    - **Quality Contribution‌:** Successfully develop and deliver at least one major feature while maintaining consistent high-quality contributions.
-    - **Community Involvement‌:** Actively address issues, respond to forum inquiries, participate in discussions, and engage in community-driven tasks.
+    **Requirements:** Deep understanding of ‌vLLM‌ and ‌vLLM Ascend‌ code bases, with a commitment to sustained code contributions and competency in ‌design, development, and PR review workflows‌.

-    Requires approval from existing Maintainers. The vLLM community has the final decision-making authority.
+    - **Review quality‌:** Actively participate in community code reviews, ensuring high-quality code integration.
+    - **Quality contribution‌:** Successfully develop and deliver at least one major feature while maintaining consistent high-quality contributions.
+    - **Community involvement‌:** Actively address issues, respond to forum inquiries, participate in discussions, and engage in community-driven tasks.

-    Maintainer will be empowered [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) Github repo write permissions (`Can read, clone, and push to this repository. Can also manage issues and pull requests`).
+The approval from existing Maintainers is required. The vLLM community has the final decision-making authority.
+Maintainers will be granted write access to the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) GitHub repo. This includes permission to read, clone, and push to the repository, as well as manage issues and pull requests.

 ## Nominating and Removing Maintainers

 ### The Principles

- Membership in vLLM Ascend is given to individuals on merit basis after they demonstrated strong expertise of the vLLM / vLLM Ascend through contributions, reviews and discussions.
+- Membership in vLLM Ascend is given to individuals on merit basis after they demonstrate their strong expertise in vLLM/vLLM Ascend through contributions, reviews, and discussions.

- For membership in the maintainer group the individual has to demonstrate strong and continued alignment with the overall vLLM / vLLM Ascend principles.
+- For membership in the maintainer group, individuals have to demonstrate strong and continued alignment with the overall vLLM/vLLM Ascend principles.

- Light criteria of moving module maintenance to ‘emeritus’ status if they don’t actively participate over long periods of time.
+- Maintainers who have been inactive for a long time may be transitioned to **emeritus** status under lenient criteria.

 - The membership is for an individual, not a company.

 ### Nomination and Removal

- Nomination: Anyone can nominate someone to become a maintainer (include self-nominate). All existing maintainers are responsible for evaluating the nomination. The nominator should provide nominee's info around the strength of the candidate to be a maintainer, include but not limited to review quality, quality contribution, community involvement.
- Removal: Anyone can nominate a person to be removed from maintainer position (include self-nominate). All existing maintainers are responsible for evaluating the nomination. The nominator should provide nominee's info, include but not limited to lack of activity, conflict with the overall direction and other information that makes them unfit to be a maintainer.
+- Nomination: Anyone can nominate a candidate to become a maintainer, including self-nominations. All existing maintainers are responsible for reviewing and evaluating each nomination. The nominator should provide relevant information about the nominee's qualifications—such as review quality, quality contribution, and community involvement—among other strengths.
+- Removal: Anyone may nominate an individual for removal from the maintainer role, including self-nominations. All current maintainers are responsible for reviewing and evaluating such nominations. The nominator should provide relevant information about the nominee—such as prolonged inactivity, misalignment with the project's overall direction, or other factors that may render them unsuitable for the maintainer position.
--- a/docs/source/community/user_stories/index.md
+++ b/docs/source/community/user_stories/index.md
@@ -1,16 +1,16 @@
-# User Stories
+# User stories

-Read case studies on how users and developers solves real, everyday problems with vLLM Ascend
+Read case studies on how users and developers solve real, everyday problems with vLLM Ascend

- [LLaMA-Factory](./llamafactory.md) is an easy-to-use and efficient platform for training and fine-tuning large language models, it supports vLLM Ascend to speed up inference since [LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739), gain 2x performance enhancement of inference.
+- [LLaMA-Factory](./llamafactory.md) is an easy-to-use and efficient platform for training and fine-tuning large language models. It supports vLLM Ascend to speed up inference since [LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739), gaining 2x performance enhancement in inference.

- [Huggingface/trl](https://github.com/huggingface/trl) is a cutting-edge library designed for post-training foundation models using advanced techniques like SFT, PPO and DPO, it uses vLLM Ascend since [v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) to support RLHF on Ascend NPU.
+- [Huggingface/trl](https://github.com/huggingface/trl) is a cutting-edge library designed for post-training foundation models using advanced techniques like SFT, PPO and DPO. It uses vLLM Ascend since [v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) to support RLHF on Ascend NPUs.

- [MindIE Turbo](https://pypi.org/project/mindie-turbo) is an LLM inference engine acceleration plug-in library developed by Huawei on Ascend hardware, which includes self-developed large language model optimization algorithms and optimizations related to the inference engine framework. It supports vLLM Ascend since [2.0rc1](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html).
+- [MindIE Turbo](https://pypi.org/project/mindie-turbo) is an LLM inference engine acceleration plugin library developed by Huawei on Ascend hardware, which includes self-developed LLM optimization algorithms and optimizations related to the inference engine framework. It supports vLLM Ascend since [2.0rc1](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html).

- [GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU cluster manager for running AI models. It supports vLLM Ascend since [v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2), see more GPUStack performance evaluation info on [link](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew).
+- [GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU cluster manager for running AI models. It supports vLLM Ascend since [v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2). See more GPUStack performance evaluation information at [this link](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew).

- [verl](https://github.com/volcengine/verl) is a flexible, efficient and production-ready RL training library for large language models (LLMs), uses vLLM Ascend since [v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0), see more info on [verl x Ascend Quickstart](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html).
+- [verl](https://github.com/volcengine/verl) is a flexible, efficient, and production-ready RL training library for LLMs. It uses vLLM Ascend since [v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0). See more information on [verl x Ascend Quickstart](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html).

 :::{toctree}
 :caption: More details
--- a/docs/source/community/user_stories/llamafactory.md
+++ b/docs/source/community/user_stories/llamafactory.md
@@ -1,19 +1,19 @@
 # LLaMA-Factory

-**About / Introduction**
+**Introduction**

 [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.

-LLaMA-Facotory users need to evaluate and inference the model after fine-tuning the model.
+LLaMA-Facotory users need to evaluate and inference the model after fine-tuning.

-**The Business Challenge**
+**Business challenge**

-LLaMA-Factory used transformers to perform inference on Ascend NPU, but the speed was slow.
+LLaMA-Factory uses Transformers to perform inference on Ascend NPUs, but the speed is slow.

-**Solving Challenges and Benefits with vLLM Ascend**
+**Benefits with vLLM Ascend**

-With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), the performance of LLaMA-Factory in the model inference stage has been significantly improved. According to the test results, the inference speed of LLaMA-Factory has been increased to 2x compared to the transformers version.
+With the joint efforts of LLaMA-Factory and vLLM Ascend ([LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), LLaMA-Factory has achieved significant performance gains during model inference. Benchmark results show that its inference speed is now up to 2× faster compared to the Transformers implementation.

 **Learn more**

-See more about LLaMA-Factory and how it uses vLLM Ascend for inference on the Ascend NPU in the following documentation: [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html).
+See more details about LLaMA-Factory and how it uses vLLM Ascend for inference on Ascend NPUs in [LLaMA-Factory Ascend NPU Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html).
--- a/docs/source/community/versioning_policy.md
+++ b/docs/source/community/versioning_policy.md
@@ -4,21 +4,21 @@ Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](htt

 ## vLLM Ascend Plugin versions

-Each vLLM Ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
+Each vLLM Ascend release is versioned as `v[major].[minor].[micro][rcN][.postN]` (such as
 `v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)

- **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
- **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
- **Post releases**: will typically be released **on demand** to support to address minor errors in a final release. It's different from [PEP-440 post release note](https://peps.python.org/pep-0440/#post-releases) suggestion, it will contain actual bug fixes considering that the final release version should be matched strictly with the vLLM final release version (`v[major].[minor].[micro]`). The post version has to be published as a patch version of the final release.
+- **Final releases**: Typically scheduled every three months, with careful alignment to the vLLM upstream release cycle and the Ascend software product roadmap.
+- **Pre releases**: Typically issued **on demand**, labeled with rcN to indicate the Nth release candidate. They are intended to support early testing by users ahead of the final release.
+- **Post releases**: Typically issued **on demand** to address minor errors in a final release. Different from [PEP-440 post release note](https://peps.python.org/pep-0440/#post-releases) convention, these versions include actual bug fixes, as the final release version must strictly align with the vLLM final release format (`v[major].[minor].[micro]`). Any post version must be published as a patch version of the final release.

 For example:
- `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
- `v0.7.3rc1`: will be the first pre version of vLLM Ascend.
- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
+- `v0.7.x`: first final release to match the vLLM `v0.7.x` version.
+- `v0.7.3rc1`: first pre version of vLLM Ascend.
+- `v0.7.3.post1`: post release for the `v0.7.3` release if it has some minor errors.

-## Release Compatibility Matrix
+## Release compatibility matrix

-Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
+The table below is the release compatibility matrix for vLLM Ascend Plugin.

 | vLLM Ascend | vLLM         | Python           | Stable CANN | PyTorch/torch_npu  | MindIE Turbo |
 |-------------|--------------|------------------|-------------|--------------------|--------------|
@@ -40,7 +40,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:

 ## Release cadence

-### release window
+### Release window

 | Date       | Event                                     |
 |------------|-------------------------------------------|
@@ -66,70 +66,70 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:

 ## Branch policy

-vLLM Ascend has main branch and dev branch.
+vLLM Ascend includes two branches: main and dev.

- **main**: main branch，corresponds to the vLLM main branch and latest 1 or 2 release version. It is continuously monitored for quality through Ascend CI.
+- **main**: corresponds to the vLLM main branch and latest 1 or 2 release version. It is continuously monitored for quality through Ascend CI.
 - **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.

-Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
+Commits should typically be merged into the main branch first, and only then backported to the dev branch, to reduce maintenance costs as much as possible.

-### Maintenance branch and EOL:
-The branch status will be in one of the following states:
+### Maintenance branch and EOL
+The table below lists branch states. 

-| Branch            | Time frame                       | Summary                                                              |
-|-------------------|----------------------------------|----------------------------------------------------------------------|
-| Maintained        | Approximately 2-3 minor versions | All bugfixes are appropriate. Releases produced, CI commitment.      |
-| Unmaintained      | Community interest driven        | All bugfixes are appropriate. No Releases produced, No CI commitment |
-| End of Life (EOL) | N/A                              | Branch no longer accepting changes                                   |
+| Branch            | Time Frame                       | Summary                                                   |
+| ----------------- | -------------------------------- | --------------------------------------------------------- |
+| Maintained        | Approximately 2-3 minor versions | Bugfixes received; releases produced; CI commitment       |
+| Unmaintained      | Community-interest driven        | Bugfixes received; no releases produced; no CI commitment |
+| End of Life (EOL) | N/A                              | Branch no longer accepting changes                        |

-### Branch state
+### Branch states

-Note that vLLM Ascend will only be released for a certain vLLM release version rather than all versions. Hence, You might see only part of versions have dev branches (such as only `0.7.1-dev` / `0.7.3-dev` but no `0.7.2-dev`), this is as expected.
+Note that vLLM Ascend will only be released for a certain vLLM release version, not for every version. Hence, you may notice that some versions have corresponding dev branches (e.g. `0.7.1-dev` and `0.7.3-dev` ), while others do not (e.g. `0.7.2-dev`).

-Usually, each minor version of vLLM (such as 0.7) will correspond to a vLLM Ascend version branch and support its latest version (for example, we plan to support version 0.7.3) as following shown:
+Usually, each minor version of vLLM (such as 0.7) corresponds to a vLLM Ascend version branch and supports its latest version (such as 0.7.3), as shown below:

-| Branch     | Status       | Note                                 |
-|------------|--------------|--------------------------------------|
-| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.9.2 branch   |
-| v0.9.1-dev | Maintained   | CI commitment for vLLM 0.9.1 version |
-| v0.7.3-dev | Maintained   | CI commitment for vLLM 0.7.3 version |
-| v0.7.1-dev | Unmaintained | Replaced by v0.7.3-dev               |
+| Branch     | State        | Note                                                     |
+| ---------- | ------------ | -------------------------------------------------------- |
+| main       | Maintained   | CI commitment for vLLM main branch and vLLM 0.9.2 branch |
+| v0.9.1-dev | Maintained   | CI commitment for vLLM 0.9.1 version                     |
+| v0.7.3-dev | Maintained   | CI commitment for vLLM 0.7.3 version                     |
+| v0.7.1-dev | Unmaintained | Replaced by v0.7.3-dev                                   |

 ### Feature branches

-| Branch     | Status       | RFC link                              | Merge plan | Mentor |
+| Branch     | State       | RFC Link                             | Scheduled Merge Time | Mentor |
 |------------|--------------|---------------------------------------|------------|--------|
 |rfc/long_seq_optimization|Maintained|https://github.com/vllm-project/vllm/issues/22693|930|wangxiyuan|
 - Branch: The feature branch should be created with a prefix `rfc/` followed by the feature name, such as `rfc/feature-name`.
- Status: The status of the feature branch is `Maintained` until it is merged into the main branch or deleted.
- RFC link: The feature branch should be created with a corresponding RFC issue. The creation of a feature branch requires an RFC and approval from at least two maintainers.
- Merge plan: The final goal of a feature branch is to merge it into the main branch. If it exceeds 3 months, the mentor maintainer should evaluate whether to delete the branch.
+- State: The state of the feature branch is `Maintained` until it is merged into the main branch or deleted.
+- RFC Link: The feature branch should be created with a corresponding RFC issue. The creation of a feature branch requires an RFC and approval from at least two maintainers.
+- Scheduled Merge Time: The final goal of a feature branch is to be merged into the main branch. If it remains unmerged for more than three months, the mentor maintainer should evaluate whether to delete the branch.
 - Mentor: The mentor should be a vLLM Ascend maintainer who is responsible for the feature branch.

 ### Backward compatibility

-For main branch, vLLM Ascend should works with vLLM main branch and latest 1 or 2 release version. So to ensure the backward compatibility, we will do the following:
- Both main branch and target vLLM release is tested by Ascend E2E CI. For example, currently, vLLM main branch and vLLM 0.8.4 are tested now.
- For code changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. In this case, vLLM Ascend introduced a version check machinism inner the code. It'll check the version of installed vLLM package first to decide which code logic to use. If users hit the `InvalidVersion` error, it sometimes means that they have installed an dev/editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
- For documentation changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. Note should be added if there are any breaking changes.
+For main branch, vLLM Ascend should works with vLLM main branch and latest 1 or 2 releases. To ensure backward compatibility, do as follows:
+- Both main branch and target vLLM release, such as the vLLM main branch and vLLM 0.8.4, are tested by Ascend E2E CI.
+- To make sure that code changes are compatible with the latest 1 or 2 vLLM releases, vLLM Ascend introduces a version check mechanism inside the code. It checks the version of the installed vLLM package first to decide which code logic to use. If users hit the `InvalidVersion` error, it may indicate that they have installed a dev or editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
+- Document changes should be compatible with the latest 1 or 2 vLLM releases. Notes should be added if there are any breaking changes.

-## Document Branch Policy
+## Document branch policy
 To reduce maintenance costs, **all branch documentation content should remain consistent, and version differences can be controlled via variables in [docs/source/conf.py](https://github.com/vllm-project/vllm-ascend/blob/main/docs/source/conf.py)**. While this is not a simple task, it is a principle we should strive to follow.

 | Version | Purpose | Code Branch |
 |-----|-----|---------|
 | latest | Doc for the latest dev branch | vX.Y.Z-dev (Will be `main` after the first final release) |
 | version | Doc for historical released versions | Git tags, like vX.Y.Z[rcN] |
-| stable（not yet released） | Doc for latest final release branch | Will be `vX.Y.Z-dev` after the first official release |
+| stable (not yet released) | Doc for latest final release branch | Will be `vX.Y.Z-dev` after the first official release |

-As shown above:
+Notes:

- `latest` documentation: Matches the current maintenance branch `vX.Y.Z-dev` (Will be `main` after the first final release). Continuously updated to ensure usability for the latest release.
- `version` documentation: Corresponds to specific released versions (e.g., `v0.7.3`, `v0.7.3rc1`). No further updates after release.
+- `latest` documentation: Matches the current maintenance branch `vX.Y.Z-dev` (will be `main` after the first final release). It is continuously updated to ensure usability for the latest release.
+- `version` documentation: Corresponds to specific released versions (e.g., `v0.7.3`, `v0.7.3rc1`). There are no further updates after release.
 - `stable` documentation (**not yet released**): Official release documentation. Updates are allowed in real-time after release, typically based on vX.Y.Z-dev. Once stable documentation is available, non-stable versions should display a header warning: `You are viewing the latest developer preview docs. Click here to view docs for the latest stable release.`.

-## Software Dependency Management
+## Software dependency management
 - `torch-npu`: Ascend Extension for PyTorch (torch-npu) releases a stable version to [PyPi](https://pypi.org/project/torch-npu)
  every 3 months, a development version (aka the POC version) every month, and a nightly version every day.
-  The PyPi stable version **CAN** be used in vLLM Ascend final version, the monthly dev version **ONLY CANN** be used in
-  vLLM Ascend RC version for rapid iteration, the nightly version **CANNOT** be used in vLLM Ascend any version and branches.
+  The PyPi stable version **CAN** be used in vLLM Ascend final version, the monthly dev version **ONLY CAN** be used in
+  vLLM Ascend RC version for rapid iteration, and the nightly version **CANNOT** be used in vLLM Ascend any version and branch.