xc-llm-ascend/docs/source/developer_guide/contribution/testing.md

# Testing

This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.

## Setup test environment

The fastest way to setup test environment is to use the main branch container image:

:::::{tab-set}
:sync-group: e2e

::::{tab-item} Local (CPU)
:selected:
:sync: cpu

You can run the unit tests on CPU with the following steps:

```{code-block} bash
   :substitutions:

cd ~/vllm-project/
# ls
# vllm  vllm-ascend

# Use mirror to speedup download
# docker pull quay.nju.edu.cn/ascend/cann:|cann_image_tag|
export IMAGE=quay.io/ascend/cann:|cann_image_tag|
docker run --rm --name vllm-ascend-ut \
    -v $(pwd):/vllm-project \
    -v ~/.cache:/root/.cache \
    -ti $IMAGE bash

# (Optional) Configure mirror to speedup download
sed -i 's|ports.ubuntu.com|mirrors.huaweicloud.com|g' /etc/apt/sources.list
pip config set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple/

# For torch-npu dev version or x86 machine
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"

apt-get update -y
apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2

# Install vllm
cd /vllm-project/vllm
VLLM_TARGET_DEVICE=empty python3 -m pip -v install .

# Install vllm-ascend
cd /vllm-project/vllm-ascend
# [IMPORTANT] Import LD_LIBRARY_PATH to enumerate the CANN environment under CPU
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
python3 -m pip install -r requirements-dev.txt
python3 -m pip install -v .
```

::::

::::{tab-item} Single card
:sync: single

```{code-block} bash
   :substitutions:

# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci0
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:main
docker run --rm \
    --name vllm-ascend \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -p 8000:8000 \
    -it $IMAGE bash
```

After starting the container, you should install the required packages:

```bash
# Prepare
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install required packages
pip install -r requirements-dev.txt
```

::::

::::{tab-item} Multi cards
:sync: multi

```{code-block} bash
   :substitutions:
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:main
docker run --rm \
    --name vllm-ascend \
    --device /dev/davinci0 \
    --device /dev/davinci1 \
    --device /dev/davinci2 \
    --device /dev/davinci3 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -p 8000:8000 \
    -it $IMAGE bash
```

After starting the container, you should install the required packages:

```bash
cd /vllm-workspace/vllm-ascend/

# Prepare
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install required packages
pip install -r requirements-dev.txt
```

::::

:::::

## Running tests

### Unit test

There are several principles to follow when writing unit tests:

- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py`
- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests.
- All unit tests can be run on CPU, so you must mock the device-related function to host.
- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py).
- You can run the unit tests using `pytest`:

:::::{tab-set}
:sync-group: e2e

::::{tab-item} Local (CPU)
:selected:
:sync: cpu

```bash
# Run unit tests
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib
TORCH_DEVICE_BACKEND_AUTOLOAD=0 pytest -sv tests/ut
```

::::

::::{tab-item} Single card
:sync: single

```bash
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
pytest -sv tests/ut

# Run single test
pytest -sv tests/ut/test_ascend_config.py
```
::::

::::{tab-item} Multi cards test
:sync: multi

```bash
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
pytest -sv tests/ut

# Run single test
pytest -sv tests/ut/test_ascend_config.py
```
::::

:::::

### E2E test

Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it
locally.

:::::{tab-set}
:sync-group: e2e

::::{tab-item} Local (CPU)
:sync: cpu

You can't run e2e test on CPU.
::::

::::{tab-item} Single card
:selected:
:sync: single

```bash
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/

# Run a certain test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py

# Run a certain case in test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models
```
::::

::::{tab-item} Multi cards test
:sync: multi
```bash
cd /vllm-workspace/vllm-ascend/
# Run all single card the tests
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/

# Run a certain test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py

# Run a certain case in test script
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models
```
::::

:::::

This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).

#### E2E test example:

- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)
- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py)
- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)

    The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:
    1. Fork the original model repo in modelscope, we need all the files in the repo except for weights.
    2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}`
    3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed:

        ```python
        import torch
        from transformers import AutoTokenizer, AutoConfig
        from modeling_deepseek import DeepseekV3ForCausalLM
        from modelscope import snapshot_download

        MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"
        DIST_DTYPE = torch.bfloat16
        DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"

        config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)
        model = DeepseekV3ForCausalLM(config)
        model = model.to(DIST_DTYPE)
        model.save_pretrained(DIST_MODEL_PATH)
        ```

### Run doctest

vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:

```bash
# Run doctest
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
```

This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).
[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`# Testing`

			`This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.`

			`## Setup test environment`

			`The fastest way to setup test environment is to use the main branch container image:`

			`:::::{tab-set}`
			`:sync-group: e2e`

Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`::::{tab-item} Local (CPU)`
[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`:selected:`
Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`:sync: cpu`

			`You can run the unit tests on CPU with the following steps:`

			```{code-block} bash
			`:substitutions:`

			`cd ~/vllm-project/`
			`# ls`
			`# vllm vllm-ascend`

			`# Use mirror to speedup download`
			`# docker pull quay.nju.edu.cn/ascend/cann:\|cann_image_tag\|`
			`export IMAGE=quay.io/ascend/cann:\|cann_image_tag\|`
			`docker run --rm --name vllm-ascend-ut \`
			`-v $(pwd):/vllm-project \`
			`-v ~/.cache:/root/.cache \`
			`-ti $IMAGE bash`

			`# (Optional) Configure mirror to speedup download`
			`sed -i 's\|ports.ubuntu.com\|mirrors.huaweicloud.com\|g' /etc/apt/sources.list`
			`pip config set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple/`

			`# For torch-npu dev version or x86 machine`
			`export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"`

			`apt-get update -y`
			`apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2`

			`# Install vllm`
			`cd /vllm-project/vllm`
			`VLLM_TARGET_DEVICE=empty python3 -m pip -v install .`

			`# Install vllm-ascend`
			`cd /vllm-project/vllm-ascend`
			`# [IMPORTANT] Import LD_LIBRARY_PATH to enumerate the CANN environment under CPU`
			`export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib`
			`python3 -m pip install -r requirements-dev.txt`
			`python3 -m pip install -v .`
			```

			`::::`

			`::::{tab-item} Single card`
[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`:sync: single`

			```{code-block} bash
			`:substitutions:`

			`# Update DEVICE according to your device (/dev/davinci[0-7])`
			`export DEVICE=/dev/davinci0`
			`# Update the vllm-ascend image`
			`export IMAGE=quay.io/ascend/vllm-ascend:main`
			`docker run --rm \`
			`--name vllm-ascend \`
			`--device $DEVICE \`
			`--device /dev/davinci_manager \`
			`--device /dev/devmm_svm \`
			`--device /dev/hisi_hdc \`
			`-v /usr/local/dcmi:/usr/local/dcmi \`
			`-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \`
			`-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \`
			`-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \`
			`-v /etc/ascend_install.info:/etc/ascend_install.info \`
			`-v /root/.cache:/root/.cache \`
			`-p 8000:8000 \`
			`-it $IMAGE bash`
			```

Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`After starting the container, you should install the required packages:`

			```bash
			`# Prepare`
			`pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple`

			`# Install required packages`
			`pip install -r requirements-dev.txt`
			```

[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`::::`

			`::::{tab-item} Multi cards`
			`:sync: multi`

			```{code-block} bash
			`:substitutions:`
			`# Update the vllm-ascend image`
			`export IMAGE=quay.io/ascend/vllm-ascend:main`
			`docker run --rm \`
			`--name vllm-ascend \`
			`--device /dev/davinci0 \`
			`--device /dev/davinci1 \`
			`--device /dev/davinci2 \`
			`--device /dev/davinci3 \`
			`--device /dev/davinci_manager \`
			`--device /dev/devmm_svm \`
			`--device /dev/hisi_hdc \`
			`-v /usr/local/dcmi:/usr/local/dcmi \`
			`-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \`
			`-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \`
			`-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \`
			`-v /etc/ascend_install.info:/etc/ascend_install.info \`
			`-v /root/.cache:/root/.cache \`
			`-p 8000:8000 \`
			`-it $IMAGE bash`
			```

			`After starting the container, you should install the required packages:`

			```bash
Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`cd /vllm-workspace/vllm-ascend/`

[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`# Prepare`
			`pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple`

			`# Install required packages`
			`pip install -r requirements-dev.txt`
			```

Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`::::`

			`:::::`

[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`## Running tests`

			`### Unit test`

			`There are several principles to follow when writing unit tests:`

			- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py`
			`- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests.`
			`- All unit tests can be run on CPU, so you must mock the device-related function to host.`
			`- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py).`
			- You can run the unit tests using `pytest`:

Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`:::::{tab-set}`
			`:sync-group: e2e`

			`::::{tab-item} Local (CPU)`
			`:selected:`
			`:sync: cpu`

			```bash
			`# Run unit tests`
			`export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/$(uname -m)-linux/devlib`
[Doc] Add qwen3 embedding 8b guide (#1734) 1. Add the tutorials for qwen3-embedding-8b 2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/5923ab9524e32006ffb9354c5340b6988a45fe3e Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-07-11 17:40:17 +08:00			`TORCH_DEVICE_BACKEND_AUTOLOAD=0 pytest -sv tests/ut`
Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			```

			`::::`

			`::::{tab-item} Single card`
			`:sync: single`
[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00
Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			```bash
			`cd /vllm-workspace/vllm-ascend/`
			`# Run all single card the tests`
			`pytest -sv tests/ut`

			`# Run single test`
			`pytest -sv tests/ut/test_ascend_config.py`
			```
			`::::`

			`::::{tab-item} Multi cards test`
			`:sync: multi`

			```bash
			`cd /vllm-workspace/vllm-ascend/`
			`# Run all single card the tests`
			`pytest -sv tests/ut`

			`# Run single test`
			`pytest -sv tests/ut/test_ascend_config.py`
			```
			`::::`

			`:::::`
[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00
			`### E2E test`

			`Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it`
			`locally.`

			`:::::{tab-set}`
			`:sync-group: e2e`

Add unit test local cpu guide and enable base testcase (#1566) ### What this PR does / why we need it? Use Base test and cleanup all manaul patch code - Cleanup EPLB config to avoid tmp test file - Use BaseTest with global cache - Add license - Add a doc to setup unit test in local env ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-06 10:42:27 +08:00			`::::{tab-item} Local (CPU)`
			`:sync: cpu`

			`You can't run e2e test on CPU.`
			`::::`

[Doc] Update FAQ and add test guidance (#1360) ### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-25 09:59:23 +08:00			`::::{tab-item} Single card`
			`:selected:`
			`:sync: single`

			```bash
			`cd /vllm-workspace/vllm-ascend/`
			`# Run all single card the tests`
			`VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/`

			`# Run a certain test script`
			`VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py`

			`# Run a certain case in test script`
			`VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models`
			```
			`::::`

			`::::{tab-item} Multi cards test`
			`:sync: multi`
			```bash
			`cd /vllm-workspace/vllm-ascend/`
			`# Run all single card the tests`
			`VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/`

			`# Run a certain test script`
			`VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py`

			`# Run a certain case in test script`
			`VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models`
			```
			`::::`

			`:::::`

			`This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).`

			`#### E2E test example:`

			- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
			- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)
			- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py)
			`- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)`

			`The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:`
			`1. Fork the original model repo in modelscope, we need all the files in the repo except for weights.`
			2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}`
			3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed:

			```python
			`import torch`
			`from transformers import AutoTokenizer, AutoConfig`
			`from modeling_deepseek import DeepseekV3ForCausalLM`
			`from modelscope import snapshot_download`

			`MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"`
			`DIST_DTYPE = torch.bfloat16`
			`DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"`

			`config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)`
			`model = DeepseekV3ForCausalLM(config)`
			`model = model.to(DIST_DTYPE)`
			`model.save_pretrained(DIST_MODEL_PATH)`
			```

			`### Run doctest`

			vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
			`The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:`

			```bash
			`# Run doctest`
			`/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh`
			```

			`This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).`