EngineX-Ascend/enginex-ascend-910-vllm

Fork 1

Files

Yang Jun01 9149384e03 v0.10.1rc1

2025-09-09 09:40:35 +08:00

8.7 KiB

Raw Blame History

Testing

This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.

Setup test environment

The fastest way to setup test environment is to use the main branch container image:

:::::{tab-set} :sync-group: e2e

::::{tab-item} Local (CPU) :selected: :sync: cpu

You can run the unit tests on CPU with the following steps:

   :substitutions:

cd ~/vllm-project/
# ls
# vllm  vllm-ascend

# Use mirror to speedup download
# docker pull quay.nju.edu.cn/ascend/cann:|cann_image_tag|
export IMAGE=quay.io/ascend/cann:|cann_image_tag|
docker run --rm --name vllm-ascend-ut \
    -v $(pwd):/vllm-project \
    -v ~/.cache:/root/.cache \
    -ti $IMAGE bash

# (Optional) Configure mirror to speedup download
sed -i 's|ports.ubuntu.com|mirrors.huaweicloud.com|g' /etc/apt/sources.list
pip config set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple/

# For torch-npu dev version or x86 machine
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"

apt-get update -y
apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2

# Install vllm
cd /vllm-project/vllm
VLLM_TARGET_DEVICE=empty python3 -m pip -v install .

# Install vllm-ascend
cd /vllm-project/vllm-ascend
# [IMPORTANT] Import LD_LIBRARY_PATH to enumerate the CANN environment under CPU
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
python3 -m pip install -r requirements-dev.txt
python3 -m pip install -v .

::::

::::{tab-item} Single card :sync: single

   :substitutions:

# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci0
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:main
docker run --rm \
    --name vllm-ascend \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -p 8000:8000 \
    -it $IMAGE bash

After starting the container, you should install the required packages:

# Prepare
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install required packages
pip install -r requirements-dev.txt

::::

::::{tab-item} Multi cards :sync: multi

   :substitutions:
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:main
docker run --rm \
    --name vllm-ascend \
    --device /dev/davinci0 \
    --device /dev/davinci1 \
    --device /dev/davinci2 \
    --device /dev/davinci3 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -p 8000:8000 \
    -it $IMAGE bash

After starting the container, you should install the required packages:

cd /vllm-workspace/vllm-ascend/

# Prepare
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install required packages
pip install -r requirements-dev.txt

::::

:::::

Running tests

Unit test

There are several principles to follow when writing unit tests:

The test file path should be consistent with source file and start with test_ prefix, such as: vllm_ascend/worker/worker_v1.py --> tests/ut/worker/test_worker_v1.py
The vLLM Ascend test are using unittest framework, see here to understand how to write unit tests.
All unit tests can be run on CPU, so you must mock the device-related function to host.
Example: tests/ut/test_ascend_config.py.
You can run the unit tests using pytest:

:::::{tab-set} :sync-group: e2e

::::{tab-item} Local (CPU) :selected: :sync: cpu

# Run unit tests
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
TORCH_DEVICE_BACKEND_AUTOLOAD=0 pytest -sv tests/ut

::::

::::{tab-item} Single card :sync: single

cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
pytest -sv tests/ut

# Run single test
pytest -sv tests/ut/test_ascend_config.py

::::

::::{tab-item} Multi cards test :sync: multi

cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
pytest -sv tests/ut

# Run single test
pytest -sv tests/ut/test_ascend_config.py

::::

:::::

E2E test

Although vllm-ascend CI provide e2e test on Ascend CI, you can run it locally.

:::::{tab-set} :sync-group: e2e

::::{tab-item} Local (CPU) :sync: cpu

You can't run e2e test on CPU. ::::

::::{tab-item} Single card :selected: :sync: single

cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/

# Run a certain test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py

# Run a certain case in test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models

::::

::::{tab-item} Multi cards test :sync: multi

cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/

# Run a certain test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py

# Run a certain case in test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models

::::

:::::

This will reproduce e2e test: vllm_ascend_test.yaml.

E2E test example:

Offline test example: tests/e2e/singlecard/test_offline_inference.py
Online test examples: tests/e2e/singlecard/test_prompt_embedding.py
Correctness test example: tests/e2e/singlecard/test_aclgraph.py

Reduced Layer model test example: test_torchair_graph_mode.py - DeepSeek-V3-Pruning

The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:

Fork the original model repo in modelscope, we need all the files in the repo except for weights.
Set num_hidden_layers to the expected number of layers, e.g., {"num_hidden_layers": 2,}

Copy the following python script as generate_random_weight.py. Set the relevant parameters MODEL_LOCAL_PATH, DIST_DTYPE and DIST_MODEL_PATH as needed:

import torch
from transformers import AutoTokenizer, AutoConfig
from modeling_deepseek import DeepseekV3ForCausalLM
from modelscope import snapshot_download

MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"
DIST_DTYPE = torch.bfloat16
DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"

config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)
model = DeepseekV3ForCausalLM(config)
model = model.to(DIST_DTYPE)
model.save_pretrained(DIST_MODEL_PATH)

Run doctest

vllm-ascend provides a vllm-ascend/tests/e2e/run_doctests.sh command to run all doctests in the doc files. The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:

# Run doctest
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh

This will reproduce the same environment as the CI: vllm_ascend_doctest.yaml.

8.7 KiB Raw Blame History

Testing

Setup test environment

Running tests

Unit test

E2E test

E2E test example:

Run doctest

8.7 KiB

Raw Blame History