diff --git a/docs/source/developer_guide/contributing.md b/docs/source/developer_guide/contributing.md index aa7ec08..f6209dd 100644 --- a/docs/source/developer_guide/contributing.md +++ b/docs/source/developer_guide/contributing.md @@ -4,7 +4,7 @@ It's recommended to set up a local development environment to build and test before you submit a PR. -### Prepare environment and build +### Setup development environment Theoretically, the vllm-ascend build is only supported on Linux because `vllm-ascend` dependency `torch_npu` only supports Linux. @@ -48,72 +48,11 @@ bash format.sh git commit -sm "your commit info" ``` -### Testing +🎉 Congratulations! You have completed the development environment setup. -Although vllm-ascend CI provide integration test on [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml), you can run it -locally. The simplest way to run these integration tests locally is through a container: - -```bash -# Under Ascend NPU environment -git clone https://github.com/vllm-project/vllm-ascend.git -cd vllm-ascend - -export IMAGE=vllm-ascend-dev-image -export CONTAINER_NAME=vllm-ascend-dev -export DEVICE=/dev/davinci1 - -# The first build will take about 10 mins (10MB/s) to download the base image and packages -docker build -t $IMAGE -f ./Dockerfile . -# You can also specify the mirror repo via setting VLLM_REPO to speedup -# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm - -docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \ - --device /dev/davinci_manager --device /dev/devmm_svm \ - --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \ - -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ - -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ - -ti $IMAGE bash - -cd vllm-ascend -pip install -r requirements-dev.txt - -pytest tests/ -``` - - -### Run doctest - -vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files. -The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows: - -```{code-block} bash - :substitutions: - -# Update DEVICE according to your device (/dev/davinci[0-7]) -export DEVICE=/dev/davinci0 -# Update the vllm-ascend image -export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version| -docker run --rm \ ---name vllm-ascend \ ---device $DEVICE \ ---device /dev/davinci_manager \ ---device /dev/devmm_svm \ ---device /dev/hisi_hdc \ --v /usr/local/dcmi:/usr/local/dcmi \ --v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ --v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ --v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ --v /etc/ascend_install.info:/etc/ascend_install.info \ --v /root/.cache:/root/.cache \ --p 8000:8000 \ --it $IMAGE bash - -# Run doctest -/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh -``` - -This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml). +### Test locally +You can refer to [Testing](./testing.md) doc to help you setup testing environment and running tests locally. ## DCO and Signed-off-by diff --git a/docs/source/developer_guide/testing.md b/docs/source/developer_guide/testing.md new file mode 100644 index 0000000..c7f413e --- /dev/null +++ b/docs/source/developer_guide/testing.md @@ -0,0 +1,183 @@ +# Testing + +This secition explains how to write e2e tests and unit tests to verify the implementation of your feature. + +## Setup test environment + +The fastest way to setup test environment is to use the main branch container image: + +:::::{tab-set} +:sync-group: e2e + +::::{tab-item} Single card +:selected: +:sync: single + +```{code-block} bash + :substitutions: + +# Update DEVICE according to your device (/dev/davinci[0-7]) +export DEVICE=/dev/davinci0 +# Update the vllm-ascend image +export IMAGE=quay.io/ascend/vllm-ascend:main +docker run --rm \ + --name vllm-ascend \ + --device $DEVICE \ + --device /dev/davinci_manager \ + --device /dev/devmm_svm \ + --device /dev/hisi_hdc \ + -v /usr/local/dcmi:/usr/local/dcmi \ + -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ + -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ + -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ + -v /etc/ascend_install.info:/etc/ascend_install.info \ + -v /root/.cache:/root/.cache \ + -p 8000:8000 \ + -it $IMAGE bash +``` + +:::: + +::::{tab-item} Multi cards +:sync: multi + +```{code-block} bash + :substitutions: +# Update the vllm-ascend image +export IMAGE=quay.io/ascend/vllm-ascend:main +docker run --rm \ + --name vllm-ascend \ + --device /dev/davinci0 \ + --device /dev/davinci1 \ + --device /dev/davinci2 \ + --device /dev/davinci3 \ + --device /dev/davinci_manager \ + --device /dev/devmm_svm \ + --device /dev/hisi_hdc \ + -v /usr/local/dcmi:/usr/local/dcmi \ + -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ + -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ + -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ + -v /etc/ascend_install.info:/etc/ascend_install.info \ + -v /root/.cache:/root/.cache \ + -p 8000:8000 \ + -it $IMAGE bash +``` +:::: + +::::: + +After starting the container, you should install the required packages: + +```bash +# Prepare +pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple + +# Install required packages +pip install -r requirements-dev.txt +``` + +## Running tests + +### Unit test + +There are several principles to follow when writing unit tests: + +- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py` +- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests. +- All unit tests can be run on CPU, so you must mock the device-related function to host. +- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py). +- You can run the unit tests using `pytest`: + + ```bash + cd /vllm-workspace/vllm-ascend/ + # Run all single card the tests + pytest -sv tests/ut + + # Run + pytest -sv tests/ut/test_ascend_config.py + ``` + +### E2E test + +Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it +locally. + +:::::{tab-set} +:sync-group: e2e + +::::{tab-item} Single card +:selected: +:sync: single + +```bash +cd /vllm-workspace/vllm-ascend/ +# Run all single card the tests +VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/ + +# Run a certain test script +VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py + +# Run a certain case in test script +VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models +``` +:::: + +::::{tab-item} Multi cards test +:sync: multi +```bash +cd /vllm-workspace/vllm-ascend/ +# Run all single card the tests +VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/ + +# Run a certain test script +VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py + +# Run a certain case in test script +VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models +``` +:::: + +::::: + +This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml). + +#### E2E test example: + +- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py) +- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py) +- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py) +- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48) + + The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model: + 1. Fork the original model repo in modelscope, we need all the files in the repo except for weights. + 2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}` + 3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed: + + ```python + import torch + from transformers import AutoTokenizer, AutoConfig + from modeling_deepseek import DeepseekV3ForCausalLM + from modelscope import snapshot_download + + MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning" + DIST_DTYPE = torch.bfloat16 + DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer" + + config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True) + model = DeepseekV3ForCausalLM(config) + model = model.to(DIST_DTYPE) + model.save_pretrained(DIST_MODEL_PATH) + ``` + +### Run doctest + +vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files. +The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows: + +```bash +# Run doctest +/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh +``` + +This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml). diff --git a/docs/source/faqs.md b/docs/source/faqs.md index c14e8cb..ac43bef 100644 --- a/docs/source/faqs.md +++ b/docs/source/faqs.md @@ -126,3 +126,40 @@ And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after the tenso ### 17. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend? You may encounter the problem of C compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, it is recommended to use `python setup.py install` to install, or use `python setup.py clean` to clear the cache. + +### 18. How to generate determinitic results when using vllm-ascend? +There are several factors that affect output certainty: + +1. Sampler Method: using **Greedy sample** by setting `temperature=0` in `SamplingParams`, e.g.: + +```python +from vllm import LLM, SamplingParams + +prompts = [ + "Hello, my name is", + "The president of the United States is", + "The capital of France is", + "The future of AI is", +] + +# Create a sampling params object. +sampling_params = SamplingParams(temperature=0) +# Create an LLM. +llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct") + +# Generate texts from the prompts. +outputs = llm.generate(prompts, sampling_params) +for output in outputs: + prompt = output.prompt + generated_text = output.outputs[0].text + print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") +``` + +2. Set the following enveriments parameters: + +```bash +export LCCL_DETERMINISTIC = 1 +export HCCL_DETERMINISTIC = 1 +export ATB_MATMUL_SHUFFLE_K_ENABLE = 0 +export ATB_LLM_LCOC_ENABLE = 0 +``` diff --git a/docs/source/index.md b/docs/source/index.md index c4660c5..5c74cbc 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -57,6 +57,7 @@ user_guide/release_notes :caption: Developer Guide :maxdepth: 1 developer_guide/contributing +developer_guide/testing developer_guide/versioning_policy developer_guide/evaluation/index :::