[Doc] Add patch doc (#1414)
1. Format the developer guide content to make it more clear 2. Add the patch doc for developer guide Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -85,3 +85,10 @@ If the PR spans more than one category, please include all relevant prefixes.
|
||||
|
||||
You may find more information about contributing to vLLM Ascend backend plugin on [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html).
|
||||
If you find any problem when contributing, you can feel free to submit a PR to improve the doc to help other developers.
|
||||
|
||||
|
||||
:::{toctree}
|
||||
:caption: Index
|
||||
:maxdepth: 1
|
||||
testing
|
||||
:::
|
||||
@@ -1,17 +1,10 @@
|
||||
# Evaluation
|
||||
# Accuracy
|
||||
|
||||
:::{toctree}
|
||||
:caption: Accuracy
|
||||
:maxdepth: 1
|
||||
using_evalscope
|
||||
using_lm_eval
|
||||
using_opencompass
|
||||
using_evalscope
|
||||
accuracy_report/index
|
||||
:::
|
||||
|
||||
:::{toctree}
|
||||
:caption: Performance
|
||||
:maxdepth: 1
|
||||
performance_benchmark
|
||||
profile_execute_duration
|
||||
:::
|
||||
9
docs/source/developer_guide/feature_guide/index.md
Normal file
9
docs/source/developer_guide/feature_guide/index.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Feature Guide
|
||||
|
||||
This section provides an overview of the features implemented in vLLM Ascend. Developers can refer to this guide to understand how vLLM Ascend works.
|
||||
|
||||
:::{toctree}
|
||||
:caption: Feature Guide
|
||||
:maxdepth: 1
|
||||
patch
|
||||
:::
|
||||
82
docs/source/developer_guide/feature_guide/patch.md
Normal file
82
docs/source/developer_guide/feature_guide/patch.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Patch in vLLM Ascend
|
||||
|
||||
vLLM Ascend is a platform plugin for vLLM. Due to the release cycle of vLLM and vLLM Ascend is different, and the hardware limitation in some case, we need to patch some code in vLLM to make it compatible with vLLM Ascend.
|
||||
|
||||
In vLLM Ascend code, we provide a patch module `vllm_ascend/patch` to address the change for vLLM.
|
||||
|
||||
## Principle
|
||||
|
||||
We should keep in mind that Patch is not the best way to make vLLM Ascend compatible. It's just a temporary solution. The best way is to contribute the change to vLLM to make it compatible with vLLM Ascend originally. In vLLM Ascend, we have the basic principle for Patch strategy:
|
||||
|
||||
1. Less is more. Please do not patch unless it's the only way currently.
|
||||
2. Once a patch is added, it's required to describe the future plan for removing the patch.
|
||||
3. Anytime, clean the patch code is welcome.
|
||||
|
||||
## How it work
|
||||
|
||||
In `vllm_ascend/patch`, you can see the code structure as follows:
|
||||
|
||||
```
|
||||
vllm_ascend
|
||||
├── patch
|
||||
│ ├── platform
|
||||
│ │ ├── patch_0_9_1
|
||||
│ │ ├── patch_common
|
||||
│ │ ├── patch_main
|
||||
│ ├── worker
|
||||
│ │ ├── patch_0_9_1
|
||||
│ │ ├── patch_common
|
||||
│ │ ├── patch_main
|
||||
└───────────
|
||||
```
|
||||
|
||||
- **platform**: The patch code in this directory is for patching the code in vLLM main process. It's called by `vllm_ascend/platform::NPUPlatform::pre_register_and_update` very early when vLLM is initialized.
|
||||
- for online mode, vLLM process calls the platform patch here `vllm/vllm/engine/arg_utils.py::AsyncEngineArgs.add_cli_args` when parsing the cli args.
|
||||
- for offline mode, vLLM process calls the platform patch here `vllm/vllm/engine/arg_utils.py::EngineArgs.create_engine_config` when parsing the input parameters.
|
||||
- **worker**: The patch code in this directory is for patching the code in vLLM worker process. It's called by `vllm_ascend/worker/worker_v1::NPUWorker::__init__` when the vLLM worker process is initialized.
|
||||
- for both online and offline mode, vLLM engine core process calls the worker patch here `vllm/vllm/worker/worker_base.py::WorkerWrapperBase.init_worker` when initializing the worker process.
|
||||
|
||||
In both **platform** and **worker** folder, there are several patch module. They are used for patching different version of vLLM.
|
||||
|
||||
- `patch_0_9_1`: This module is used for patching vLLM 0.9.1. The version is always the nearest version of vLLM. Once vLLM is released, we will drop this patch module and bump a new version. For example, `patch_0_9_2` is used for patching vLLM 0.9.2.
|
||||
- `patch_main`: This module is used for patching the code in vLLM main branch.
|
||||
- `patch_common`: This module is used for patching both vLLM 0.9.1 and vLLM main branch.
|
||||
|
||||
## How to write a patch
|
||||
|
||||
Before writing a patch, following the principle above, we should patch the least code. If it's necessary, we can patch the code in either **platform** and **worker** folder. Here is an example to patch `distributed` module in vLLM.
|
||||
|
||||
1. Decide which version of vLLM we should patch. For example, after analysis, here we want to patch both 0.9.1 and main of vLLM.
|
||||
2. Decide which process we should patch. For example, here `distributed` belongs to the vLLM main process, so we should patch `platform`.
|
||||
3. Create the patch file in the write folder. The file should be named as `patch_{module_name}.py`. The example here is `vllm_ascend/patch/platform/patch_common/patch_distributed.py`.
|
||||
4. Write your patch code in the new file. Here is an example:
|
||||
```python
|
||||
import vllm
|
||||
|
||||
def patch_destroy_model_parallel():
|
||||
# your patch code
|
||||
...
|
||||
|
||||
vllm.distributed.parallel_state.destroy_model_parallel = patch_destroy_model_parallel
|
||||
```
|
||||
5. Import the patch file in `__init__.py`. In this example, add `import vllm_ascend.patch.platform.patch_common.patch_distributed` into `vllm_ascend/patch/platform/patch_common/__init__.py`.
|
||||
6. Add the description of the patch in `vllm_ascend/patch/__init__.py`. The description format is as follows:
|
||||
```
|
||||
# ** File: <The patch file name> **
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `<The target patch module in vLLM>`
|
||||
# Why:
|
||||
# <Describe the reason why we need to patch>
|
||||
# How:
|
||||
# <Describe the way to patch>
|
||||
# Related PR (if no, explain why):
|
||||
# <Add a link to the related PR in vLLM. If there is no related PR, explain why>
|
||||
# Future Plan:
|
||||
# <Describe the future plan to remove the patch>
|
||||
```
|
||||
7. Add the Unit Test and E2E Test. Any new added code in vLLM Ascend should contain the Unit Test and E2E Test as well. You can find more detail in [test guide](../contribution/testing.md)
|
||||
|
||||
|
||||
## Limitation
|
||||
1. In V1 Engine, vLLM start three kinds for process: Main process, EngineCore process and Worker process. Now vLLM Ascend only support patch the code in Main process and Worker process by default. If you want to patch the code runs in EngineCore process, you should patch EngineCore process totally during setup, the entry code is here `vllm.v1.engine.core`. Please override `EngineCoreProc` and `DPEngineCoreProc` totally.
|
||||
2. If you are running an edited vLLM code, the version of the vLLM may be changed automatically. For example, if you runs an edited vLLM basing on v0.9.1, the version of vLLM may be change to v0.9.2xxx, in this case, the patch for v0.9.1 in vLLM Ascend would not work as expect, because that vLLM Ascend can't distinguish the version of vLLM you're using. In this case, you can set the environment variable `VLLM_VERSION` to specify the version of vLLM you're using, then the patch for v0.9.1 should work.
|
||||
8
docs/source/developer_guide/performance/index.md
Normal file
8
docs/source/developer_guide/performance/index.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Performance
|
||||
|
||||
:::{toctree}
|
||||
:caption: Performance
|
||||
:maxdepth: 1
|
||||
performance_benchmark
|
||||
profile_execute_duration
|
||||
:::
|
||||
@@ -1,108 +0,0 @@
|
||||
# Versioning policy
|
||||
|
||||
Starting with vLLM 0.7.x, the vLLM Ascend Plugin ([vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend)) project follows the [PEP 440](https://peps.python.org/pep-0440/) to publish matching with vLLM ([vllm-project/vllm](https://github.com/vllm-project/vllm)).
|
||||
|
||||
## vLLM Ascend Plugin versions
|
||||
|
||||
Each vLLM Ascend release will be versioned: `v[major].[minor].[micro][rcN][.postN]` (such as
|
||||
`v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)
|
||||
|
||||
- **Final releases**: will typically be released every **3 months**, will take the vLLM upstream release plan and Ascend software product release plan into comprehensive consideration.
|
||||
- **Pre releases**: will typically be released **on demand**, ending with rcN, represents the Nth release candidate version, to support early testing by our users prior to a final release.
|
||||
- **Post releases**: will typically be released **on demand** to support to address minor errors in a final release. It's different from [PEP-440 post release note](https://peps.python.org/pep-0440/#post-releases) suggestion, it will contain actual bug fixes considering that the final release version should be matched strictly with the vLLM final release version (`v[major].[minor].[micro]`). The post version has to be published as a patch version of the final release.
|
||||
|
||||
For example:
|
||||
- `v0.7.x`: it's the first final release to match the vLLM `v0.7.x` version.
|
||||
- `v0.7.3rc1`: will be the first pre version of vLLM Ascend.
|
||||
- `v0.7.3.post1`: will be the post release if the `v0.7.3` release has some minor errors.
|
||||
|
||||
## Release Compatibility Matrix
|
||||
|
||||
Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
|
||||
|
||||
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
|
||||
|-------------|--------------|------------------|-------------|--------------------|--------------|
|
||||
| v0.9.1rc1 | v0.9.1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1.post1.dev20250528 | |
|
||||
| v0.9.0rc2 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
|
||||
| v0.9.0rc1 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
|
||||
| v0.8.5rc1 | v0.8.5.post1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
|
||||
| v0.8.4rc2 | v0.8.4 | >= 3.9, < 3.12 | 8.0.0 | 2.5.1 / 2.5.1 | |
|
||||
| v0.7.3.post1| v0.7.3 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | 2.0rc1 |
|
||||
| v0.7.3 | v0.7.3 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | 2.0rc1 |
|
||||
|
||||
## Release cadence
|
||||
|
||||
### release window
|
||||
|
||||
| Date | Event |
|
||||
|------------|-------------------------------------------|
|
||||
| 2025.06.22 | Release candidates, v0.9.1rc1 |
|
||||
| 2025.06.10 | Release candidates, v0.9.0rc2 |
|
||||
| 2025.06.09 | Release candidates, v0.9.0rc1 |
|
||||
| 2025.05.29 | v0.7.x post release, v0.7.3.post1 |
|
||||
| 2025.05.08 | v0.7.x Final release, v0.7.3 |
|
||||
| 2025.05.06 | Release candidates, v0.8.5rc1 |
|
||||
| 2025.04.28 | Release candidates, v0.8.4rc2 |
|
||||
| 2025.04.18 | Release candidates, v0.8.4rc1 |
|
||||
| 2025.03.28 | Release candidates, v0.7.3rc2 |
|
||||
| 2025.03.14 | Release candidates, v0.7.3rc1 |
|
||||
| 2025.02.19 | Release candidates, v0.7.1rc1 |
|
||||
|
||||
## Branch policy
|
||||
|
||||
vLLM Ascend has main branch and dev branch.
|
||||
|
||||
- **main**: main branch,corresponds to the vLLM main branch and latest 1 or 2 release version. It is continuously monitored for quality through Ascend CI.
|
||||
- **vX.Y.Z-dev**: development branch, created with part of new releases of vLLM. For example, `v0.7.3-dev` is the dev branch for vLLM `v0.7.3` version.
|
||||
|
||||
Usually, a commit should be ONLY first merged in the main branch, and then backported to the dev branch to reduce maintenance costs as much as possible.
|
||||
|
||||
### Maintenance branch and EOL:
|
||||
The branch status will be in one of the following states:
|
||||
|
||||
| Branch | Time frame | Summary |
|
||||
|-------------------|----------------------------------|----------------------------------------------------------------------|
|
||||
| Maintained | Approximately 2-3 minor versions | All bugfixes are appropriate. Releases produced, CI commitment. |
|
||||
| Unmaintained | Community interest driven | All bugfixes are appropriate. No Releases produced, No CI commitment |
|
||||
| End of Life (EOL) | N/A | Branch no longer accepting changes |
|
||||
|
||||
### Branch state
|
||||
|
||||
Note that vLLM Ascend will only be released for a certain vLLM release version rather than all versions. Hence, You might see only part of versions have dev branches (such as only `0.7.1-dev` / `0.7.3-dev` but no `0.7.2-dev`), this is as expected.
|
||||
|
||||
Usually, each minor version of vLLM (such as 0.7) will correspond to a vLLM Ascend version branch and support its latest version (for example, we plan to support version 0.7.3) as following shown:
|
||||
|
||||
| Branch | Status | Note |
|
||||
|------------|--------------|--------------------------------------|
|
||||
| main | Maintained | CI commitment for vLLM main branch and vLLM 0.9.x branch |
|
||||
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.0 and 0.9.1 version |
|
||||
| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version |
|
||||
| v0.7.1-dev | Unmaintained | Replaced by v0.7.3-dev |
|
||||
|
||||
### Backward compatibility
|
||||
|
||||
For main branch, vLLM Ascend should works with vLLM main branch and latest 1 or 2 release version. So to ensure the backward compatibility, we will do the following:
|
||||
- Both main branch and target vLLM release is tested by Ascend E2E CI. For example, currently, vLLM main branch and vLLM 0.8.4 are tested now.
|
||||
- For code changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. In this case, vLLM Ascend introduced a version check machinism inner the code. It'll check the version of installed vLLM package first to decide which code logic to use. If users hit the `InvalidVersion` error, it sometimes means that they have installed an dev/editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
|
||||
- For documentation changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. Note should be added if there are any breaking changes.
|
||||
|
||||
## Document Branch Policy
|
||||
To reduce maintenance costs, **all branch documentation content should remain consistent, and version differences can be controlled via variables in [docs/source/conf.py](https://github.com/vllm-project/vllm-ascend/blob/main/docs/source/conf.py)**. While this is not a simple task, it is a principle we should strive to follow.
|
||||
|
||||
| Version | Purpose | Code Branch |
|
||||
|-----|-----|---------|
|
||||
| latest | Doc for the latest dev branch | vX.Y.Z-dev (Will be `main` after the first final release) |
|
||||
| version | Doc for historical released versions | Git tags, like vX.Y.Z[rcN] |
|
||||
| stable(not yet released) | Doc for latest final release branch | Will be `vX.Y.Z-dev` after the first official release |
|
||||
|
||||
As shown above:
|
||||
|
||||
- `latest` documentation: Matches the current maintenance branch `vX.Y.Z-dev` (Will be `main` after the first final release). Continuously updated to ensure usability for the latest release.
|
||||
- `version` documentation: Corresponds to specific released versions (e.g., `v0.7.3`, `v0.7.3rc1`). No further updates after release.
|
||||
- `stable` documentation (**not yet released**): Official release documentation. Updates are allowed in real-time after release, typically based on vX.Y.Z-dev. Once stable documentation is available, non-stable versions should display a header warning: `You are viewing the latest developer preview docs. Click here to view docs for the latest stable release.`.
|
||||
|
||||
## Software Dependency Management
|
||||
- `torch-npu`: Ascend Extension for PyTorch (torch-npu) releases a stable version to [PyPi](https://pypi.org/project/torch-npu)
|
||||
every 3 months, a development version (aka the POC version) every month, and a nightly version every day.
|
||||
The PyPi stable version **CAN** be used in vLLM Ascend final version, the monthly dev version **ONLY CANN** be used in
|
||||
vLLM Ascend RC version for rapid iteration, the nightly version **CANNOT** be used in vLLM Ascend any version and branches.
|
||||
Reference in New Issue
Block a user