[Doc] Update FAQ and add test guidance (#1360)
### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
@@ -4,7 +4,7 @@
|
|||||||
It's recommended to set up a local development environment to build and test
|
It's recommended to set up a local development environment to build and test
|
||||||
before you submit a PR.
|
before you submit a PR.
|
||||||
|
|
||||||
### Prepare environment and build
|
### Setup development environment
|
||||||
|
|
||||||
Theoretically, the vllm-ascend build is only supported on Linux because
|
Theoretically, the vllm-ascend build is only supported on Linux because
|
||||||
`vllm-ascend` dependency `torch_npu` only supports Linux.
|
`vllm-ascend` dependency `torch_npu` only supports Linux.
|
||||||
@@ -48,72 +48,11 @@ bash format.sh
|
|||||||
git commit -sm "your commit info"
|
git commit -sm "your commit info"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Testing
|
🎉 Congratulations! You have completed the development environment setup.
|
||||||
|
|
||||||
Although vllm-ascend CI provide integration test on [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml), you can run it
|
### Test locally
|
||||||
locally. The simplest way to run these integration tests locally is through a container:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Under Ascend NPU environment
|
|
||||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
|
||||||
cd vllm-ascend
|
|
||||||
|
|
||||||
export IMAGE=vllm-ascend-dev-image
|
|
||||||
export CONTAINER_NAME=vllm-ascend-dev
|
|
||||||
export DEVICE=/dev/davinci1
|
|
||||||
|
|
||||||
# The first build will take about 10 mins (10MB/s) to download the base image and packages
|
|
||||||
docker build -t $IMAGE -f ./Dockerfile .
|
|
||||||
# You can also specify the mirror repo via setting VLLM_REPO to speedup
|
|
||||||
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
|
|
||||||
|
|
||||||
docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
|
|
||||||
--device /dev/davinci_manager --device /dev/devmm_svm \
|
|
||||||
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
|
|
||||||
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
|
||||||
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
|
||||||
-ti $IMAGE bash
|
|
||||||
|
|
||||||
cd vllm-ascend
|
|
||||||
pip install -r requirements-dev.txt
|
|
||||||
|
|
||||||
pytest tests/
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Run doctest
|
|
||||||
|
|
||||||
vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
|
|
||||||
The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
|
|
||||||
|
|
||||||
```{code-block} bash
|
|
||||||
:substitutions:
|
|
||||||
|
|
||||||
# Update DEVICE according to your device (/dev/davinci[0-7])
|
|
||||||
export DEVICE=/dev/davinci0
|
|
||||||
# Update the vllm-ascend image
|
|
||||||
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
|
|
||||||
docker run --rm \
|
|
||||||
--name vllm-ascend \
|
|
||||||
--device $DEVICE \
|
|
||||||
--device /dev/davinci_manager \
|
|
||||||
--device /dev/devmm_svm \
|
|
||||||
--device /dev/hisi_hdc \
|
|
||||||
-v /usr/local/dcmi:/usr/local/dcmi \
|
|
||||||
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
|
||||||
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
|
||||||
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
|
||||||
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
|
||||||
-v /root/.cache:/root/.cache \
|
|
||||||
-p 8000:8000 \
|
|
||||||
-it $IMAGE bash
|
|
||||||
|
|
||||||
# Run doctest
|
|
||||||
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).
|
|
||||||
|
|
||||||
|
You can refer to [Testing](./testing.md) doc to help you setup testing environment and running tests locally.
|
||||||
|
|
||||||
## DCO and Signed-off-by
|
## DCO and Signed-off-by
|
||||||
|
|
||||||
|
|||||||
183
docs/source/developer_guide/testing.md
Normal file
183
docs/source/developer_guide/testing.md
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
# Testing
|
||||||
|
|
||||||
|
This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.
|
||||||
|
|
||||||
|
## Setup test environment
|
||||||
|
|
||||||
|
The fastest way to setup test environment is to use the main branch container image:
|
||||||
|
|
||||||
|
:::::{tab-set}
|
||||||
|
:sync-group: e2e
|
||||||
|
|
||||||
|
::::{tab-item} Single card
|
||||||
|
:selected:
|
||||||
|
:sync: single
|
||||||
|
|
||||||
|
```{code-block} bash
|
||||||
|
:substitutions:
|
||||||
|
|
||||||
|
# Update DEVICE according to your device (/dev/davinci[0-7])
|
||||||
|
export DEVICE=/dev/davinci0
|
||||||
|
# Update the vllm-ascend image
|
||||||
|
export IMAGE=quay.io/ascend/vllm-ascend:main
|
||||||
|
docker run --rm \
|
||||||
|
--name vllm-ascend \
|
||||||
|
--device $DEVICE \
|
||||||
|
--device /dev/davinci_manager \
|
||||||
|
--device /dev/devmm_svm \
|
||||||
|
--device /dev/hisi_hdc \
|
||||||
|
-v /usr/local/dcmi:/usr/local/dcmi \
|
||||||
|
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
||||||
|
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
||||||
|
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
||||||
|
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
||||||
|
-v /root/.cache:/root/.cache \
|
||||||
|
-p 8000:8000 \
|
||||||
|
-it $IMAGE bash
|
||||||
|
```
|
||||||
|
|
||||||
|
::::
|
||||||
|
|
||||||
|
::::{tab-item} Multi cards
|
||||||
|
:sync: multi
|
||||||
|
|
||||||
|
```{code-block} bash
|
||||||
|
:substitutions:
|
||||||
|
# Update the vllm-ascend image
|
||||||
|
export IMAGE=quay.io/ascend/vllm-ascend:main
|
||||||
|
docker run --rm \
|
||||||
|
--name vllm-ascend \
|
||||||
|
--device /dev/davinci0 \
|
||||||
|
--device /dev/davinci1 \
|
||||||
|
--device /dev/davinci2 \
|
||||||
|
--device /dev/davinci3 \
|
||||||
|
--device /dev/davinci_manager \
|
||||||
|
--device /dev/devmm_svm \
|
||||||
|
--device /dev/hisi_hdc \
|
||||||
|
-v /usr/local/dcmi:/usr/local/dcmi \
|
||||||
|
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
||||||
|
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
||||||
|
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
||||||
|
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
||||||
|
-v /root/.cache:/root/.cache \
|
||||||
|
-p 8000:8000 \
|
||||||
|
-it $IMAGE bash
|
||||||
|
```
|
||||||
|
::::
|
||||||
|
|
||||||
|
:::::
|
||||||
|
|
||||||
|
After starting the container, you should install the required packages:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Prepare
|
||||||
|
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
|
||||||
|
|
||||||
|
# Install required packages
|
||||||
|
pip install -r requirements-dev.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running tests
|
||||||
|
|
||||||
|
### Unit test
|
||||||
|
|
||||||
|
There are several principles to follow when writing unit tests:
|
||||||
|
|
||||||
|
- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py`
|
||||||
|
- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests.
|
||||||
|
- All unit tests can be run on CPU, so you must mock the device-related function to host.
|
||||||
|
- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py).
|
||||||
|
- You can run the unit tests using `pytest`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /vllm-workspace/vllm-ascend/
|
||||||
|
# Run all single card the tests
|
||||||
|
pytest -sv tests/ut
|
||||||
|
|
||||||
|
# Run
|
||||||
|
pytest -sv tests/ut/test_ascend_config.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### E2E test
|
||||||
|
|
||||||
|
Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it
|
||||||
|
locally.
|
||||||
|
|
||||||
|
:::::{tab-set}
|
||||||
|
:sync-group: e2e
|
||||||
|
|
||||||
|
::::{tab-item} Single card
|
||||||
|
:selected:
|
||||||
|
:sync: single
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /vllm-workspace/vllm-ascend/
|
||||||
|
# Run all single card the tests
|
||||||
|
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/
|
||||||
|
|
||||||
|
# Run a certain test script
|
||||||
|
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py
|
||||||
|
|
||||||
|
# Run a certain case in test script
|
||||||
|
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models
|
||||||
|
```
|
||||||
|
::::
|
||||||
|
|
||||||
|
::::{tab-item} Multi cards test
|
||||||
|
:sync: multi
|
||||||
|
```bash
|
||||||
|
cd /vllm-workspace/vllm-ascend/
|
||||||
|
# Run all single card the tests
|
||||||
|
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/
|
||||||
|
|
||||||
|
# Run a certain test script
|
||||||
|
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py
|
||||||
|
|
||||||
|
# Run a certain case in test script
|
||||||
|
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models
|
||||||
|
```
|
||||||
|
::::
|
||||||
|
|
||||||
|
:::::
|
||||||
|
|
||||||
|
This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).
|
||||||
|
|
||||||
|
#### E2E test example:
|
||||||
|
|
||||||
|
- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
|
||||||
|
- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)
|
||||||
|
- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py)
|
||||||
|
- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)
|
||||||
|
|
||||||
|
The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:
|
||||||
|
1. Fork the original model repo in modelscope, we need all the files in the repo except for weights.
|
||||||
|
2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}`
|
||||||
|
3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoTokenizer, AutoConfig
|
||||||
|
from modeling_deepseek import DeepseekV3ForCausalLM
|
||||||
|
from modelscope import snapshot_download
|
||||||
|
|
||||||
|
MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"
|
||||||
|
DIST_DTYPE = torch.bfloat16
|
||||||
|
DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"
|
||||||
|
|
||||||
|
config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)
|
||||||
|
model = DeepseekV3ForCausalLM(config)
|
||||||
|
model = model.to(DIST_DTYPE)
|
||||||
|
model.save_pretrained(DIST_MODEL_PATH)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run doctest
|
||||||
|
|
||||||
|
vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
|
||||||
|
The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run doctest
|
||||||
|
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).
|
||||||
@@ -126,3 +126,40 @@ And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after the tenso
|
|||||||
|
|
||||||
### 17. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend?
|
### 17. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend?
|
||||||
You may encounter the problem of C compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, it is recommended to use `python setup.py install` to install, or use `python setup.py clean` to clear the cache.
|
You may encounter the problem of C compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, it is recommended to use `python setup.py install` to install, or use `python setup.py clean` to clear the cache.
|
||||||
|
|
||||||
|
### 18. How to generate determinitic results when using vllm-ascend?
|
||||||
|
There are several factors that affect output certainty:
|
||||||
|
|
||||||
|
1. Sampler Method: using **Greedy sample** by setting `temperature=0` in `SamplingParams`, e.g.:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from vllm import LLM, SamplingParams
|
||||||
|
|
||||||
|
prompts = [
|
||||||
|
"Hello, my name is",
|
||||||
|
"The president of the United States is",
|
||||||
|
"The capital of France is",
|
||||||
|
"The future of AI is",
|
||||||
|
]
|
||||||
|
|
||||||
|
# Create a sampling params object.
|
||||||
|
sampling_params = SamplingParams(temperature=0)
|
||||||
|
# Create an LLM.
|
||||||
|
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
|
||||||
|
|
||||||
|
# Generate texts from the prompts.
|
||||||
|
outputs = llm.generate(prompts, sampling_params)
|
||||||
|
for output in outputs:
|
||||||
|
prompt = output.prompt
|
||||||
|
generated_text = output.outputs[0].text
|
||||||
|
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Set the following enveriments parameters:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export LCCL_DETERMINISTIC = 1
|
||||||
|
export HCCL_DETERMINISTIC = 1
|
||||||
|
export ATB_MATMUL_SHUFFLE_K_ENABLE = 0
|
||||||
|
export ATB_LLM_LCOC_ENABLE = 0
|
||||||
|
```
|
||||||
|
|||||||
@@ -57,6 +57,7 @@ user_guide/release_notes
|
|||||||
:caption: Developer Guide
|
:caption: Developer Guide
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
developer_guide/contributing
|
developer_guide/contributing
|
||||||
|
developer_guide/testing
|
||||||
developer_guide/versioning_policy
|
developer_guide/versioning_policy
|
||||||
developer_guide/evaluation/index
|
developer_guide/evaluation/index
|
||||||
:::
|
:::
|
||||||
|
|||||||
Reference in New Issue
Block a user