v0.10.1rc1

2025-09-09 09:40:35 +08:00
parent d6f6ef41fe
commit 9149384e03
432 changed files with 84698 additions and 1 deletions
--- a/docs/source/developer_guide/contribution/index.md
+++ b/docs/source/developer_guide/contribution/index.md
@@ -0,0 +1,111 @@
+# Contributing
+
+## Building and testing
+It's recommended to set up a local development environment to build and test
+before you submit a PR.
+
+### Setup development environment
+
+Theoretically, the vllm-ascend build is only supported on Linux because
+`vllm-ascend` dependency `torch_npu` only supports Linux.
+
+But you can still set up dev env on Linux/Windows/macOS for linting and basic
+test as following commands:
+
+#### Run lint locally
+
+```bash
+# Choose a base dir (~/vllm-project/) and set up venv
+cd ~/vllm-project/
+python3 -m venv .venv
+source ./.venv/bin/activate
+
+# Clone vllm-ascend and install
+git clone https://github.com/vllm-project/vllm-ascend.git
+cd vllm-ascend
+
+# Install lint requirement and enable pre-commit hook
+pip install -r requirements-lint.txt
+
+# Run lint (You need install pre-commits deps via proxy network at first time)
+bash format.sh
+```
+
+#### Run CI locally
+
+After complete "Run lint" setup, you can run CI locally:
+
+```{code-block} bash
+   :substitutions:
+
+cd ~/vllm-project/
+
+# Run CI need vLLM installed
+git clone --branch |vllm_version| https://github.com/vllm-project/vllm.git
+cd vllm
+pip install -r requirements/build.txt
+VLLM_TARGET_DEVICE="empty" pip install .
+cd ..
+
+# Install requirements
+cd vllm-ascend
+# For Linux:
+pip install -r requirements-dev.txt
+# For non Linux:
+cat requirements-dev.txt | grep -Ev '^#|^--|^$|^-r' | while read PACKAGE; do pip install "$PACKAGE"; done
+cat requirements.txt | grep -Ev '^#|^--|^$|^-r' | while read PACKAGE; do pip install "$PACKAGE"; done
+
+# Run ci:
+bash format.sh ci
+```
+
+#### Submit the commit
+
+```bash
+# Commit changed files using `-s`
+git commit -sm "your commit info"
+```
+
+🎉 Congratulations! You have completed the development environment setup.
+
+### Test locally
+
+You can refer to [Testing](./testing.md) doc to help you setup testing environment and running tests locally.
+
+## DCO and Signed-off-by
+
+When contributing changes to this project, you must agree to the DCO. Commits must include a `Signed-off-by:` header which certifies agreement with the terms of the DCO.
+
+Using `-s` with `git commit` will automatically add this header.
+
+## PR Title and Classification
+
+Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
+
+- `[Attention]` for new features or optimization in attention.
+- `[Communicator]` for new features or optimization in communicators.
+- `[ModelRunner]` for new features or optimization in model runner.
+- `[Platform]` for new features or optimization in platform.
+- `[Worker]` for new features or optimization in worker.
+- `[Core]` for new features or optimization  in the core vllm-ascend logic (such as platform, attention, communicators, model runner)
+- `[Kernel]` changes affecting compute kernels and ops.
+- `[Bugfix]` for bug fixes.
+- `[Doc]` for documentation fixes and improvements.
+- `[Test]` for tests (such as unit tests).
+- `[CI]` for build or continuous integration improvements.
+- `[Misc]` for PRs that do not fit the above categories. Please use this sparingly.
+
+:::{note}
+If the PR spans more than one category, please include all relevant prefixes.
+:::
+
+## Others
+
+You may find more information about contributing to vLLM Ascend backend plugin on [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html).
+If you find any problem when contributing, you can feel free to submit a PR to improve the doc to help other developers.
+
+:::{toctree}
+:caption: Index
+:maxdepth: 1
+testing
+:::
--- a/docs/source/developer_guide/contribution/testing.md
+++ b/docs/source/developer_guide/contribution/testing.md
@@ -0,0 +1,285 @@
+# Testing
+
+This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.
+
+## Setup test environment
+
+The fastest way to setup test environment is to use the main branch container image:
+
+:::::{tab-set}
+:sync-group: e2e
+
+::::{tab-item} Local (CPU)
+:selected:
+:sync: cpu
+
+You can run the unit tests on CPU with the following steps:
+
+```{code-block} bash
+   :substitutions:
+
+cd ~/vllm-project/
+# ls
+# vllm  vllm-ascend
+
+# Use mirror to speedup download
+# docker pull quay.nju.edu.cn/ascend/cann:|cann_image_tag|
+export IMAGE=quay.io/ascend/cann:|cann_image_tag|
+docker run --rm --name vllm-ascend-ut \
+    -v $(pwd):/vllm-project \
+    -v ~/.cache:/root/.cache \
+    -ti $IMAGE bash
+
+# (Optional) Configure mirror to speedup download
+sed -i 's|ports.ubuntu.com|mirrors.huaweicloud.com|g' /etc/apt/sources.list
+pip config set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple/
+
+# For torch-npu dev version or x86 machine
+export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"
+
+apt-get update -y
+apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2
+
+# Install vllm
+cd /vllm-project/vllm
+VLLM_TARGET_DEVICE=empty python3 -m pip -v install .
+
+# Install vllm-ascend
+cd /vllm-project/vllm-ascend
+# [IMPORTANT] Import LD_LIBRARY_PATH to enumerate the CANN environment under CPU
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
+python3 -m pip install -r requirements-dev.txt
+python3 -m pip install -v .
+```
+
+::::
+
+::::{tab-item} Single card
+:sync: single
+
+```{code-block} bash
+   :substitutions:
+
+# Update DEVICE according to your device (/dev/davinci[0-7])
+export DEVICE=/dev/davinci0
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/vllm-ascend:main
+docker run --rm \
+    --name vllm-ascend \
+    --device $DEVICE \
+    --device /dev/davinci_manager \
+    --device /dev/devmm_svm \
+    --device /dev/hisi_hdc \
+    -v /usr/local/dcmi:/usr/local/dcmi \
+    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+    -v /etc/ascend_install.info:/etc/ascend_install.info \
+    -v /root/.cache:/root/.cache \
+    -p 8000:8000 \
+    -it $IMAGE bash
+```
+
+After starting the container, you should install the required packages:
+
+```bash
+# Prepare
+pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
+
+# Install required packages
+pip install -r requirements-dev.txt
+```
+
+::::
+
+::::{tab-item} Multi cards
+:sync: multi
+
+```{code-block} bash
+   :substitutions:
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/vllm-ascend:main
+docker run --rm \
+    --name vllm-ascend \
+    --device /dev/davinci0 \
+    --device /dev/davinci1 \
+    --device /dev/davinci2 \
+    --device /dev/davinci3 \
+    --device /dev/davinci_manager \
+    --device /dev/devmm_svm \
+    --device /dev/hisi_hdc \
+    -v /usr/local/dcmi:/usr/local/dcmi \
+    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+    -v /etc/ascend_install.info:/etc/ascend_install.info \
+    -v /root/.cache:/root/.cache \
+    -p 8000:8000 \
+    -it $IMAGE bash
+```
+
+After starting the container, you should install the required packages:
+
+```bash
+cd /vllm-workspace/vllm-ascend/
+
+# Prepare
+pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
+
+# Install required packages
+pip install -r requirements-dev.txt
+```
+
+::::
+
+:::::
+
+## Running tests
+
+### Unit test
+
+There are several principles to follow when writing unit tests:
+
+- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py`
+- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests.
+- All unit tests can be run on CPU, so you must mock the device-related function to host.
+- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py).
+- You can run the unit tests using `pytest`:
+
+:::::{tab-set}
+:sync-group: e2e
+
+::::{tab-item} Local (CPU)
+:selected:
+:sync: cpu
+
+```bash
+# Run unit tests
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
+TORCH_DEVICE_BACKEND_AUTOLOAD=0 pytest -sv tests/ut
+```
+
+::::
+
+::::{tab-item} Single card
+:sync: single
+
+```bash
+cd /vllm-workspace/vllm-ascend/
+# Run all single card the tests
+pytest -sv tests/ut
+
+# Run single test
+pytest -sv tests/ut/test_ascend_config.py
+```
+
+::::
+
+::::{tab-item} Multi cards test
+:sync: multi
+
+```bash
+cd /vllm-workspace/vllm-ascend/
+# Run all single card the tests
+pytest -sv tests/ut
+
+# Run single test
+pytest -sv tests/ut/test_ascend_config.py
+```
+
+::::
+
+:::::
+
+### E2E test
+
+Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it
+locally.
+
+:::::{tab-set}
+:sync-group: e2e
+
+::::{tab-item} Local (CPU)
+:sync: cpu
+
+You can't run e2e test on CPU.
+::::
+
+::::{tab-item} Single card
+:selected:
+:sync: single
+
+```bash
+cd /vllm-workspace/vllm-ascend/
+# Run all single card the tests
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/
+
+# Run a certain test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py
+
+# Run a certain case in test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models
+```
+
+::::
+
+::::{tab-item} Multi cards test
+:sync: multi
+
+```bash
+cd /vllm-workspace/vllm-ascend/
+# Run all single card the tests
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/
+
+# Run a certain test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py
+
+# Run a certain case in test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models
+```
+
+::::
+
+:::::
+
+This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).
+
+#### E2E test example:
+
+- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
+- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)
+- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py)
+- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)
+
+    The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:
+    1. Fork the original model repo in modelscope, we need all the files in the repo except for weights.
+    2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}`
+    3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed:
+
+        ```python
+        import torch
+        from transformers import AutoTokenizer, AutoConfig
+        from modeling_deepseek import DeepseekV3ForCausalLM
+        from modelscope import snapshot_download
+
+        MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"
+        DIST_DTYPE = torch.bfloat16
+        DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"
+
+        config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)
+        model = DeepseekV3ForCausalLM(config)
+        model = model.to(DIST_DTYPE)
+        model.save_pretrained(DIST_MODEL_PATH)
+        ```
+
+### Run doctest
+
+vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
+The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
+
+```bash
+# Run doctest
+/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
+```
+
+This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).