From 0eefbe75b682bedd6435fda669aad9aeec7b939f Mon Sep 17 00:00:00 2001 From: Nengjun Ma Date: Thu, 11 Dec 2025 08:56:27 +0800 Subject: [PATCH] [Doc] Add local running multi-node nightly test case guide (#4884) ### What this PR does / why we need it? Add local running multi-node nightly test case guide, help running locally at developer env. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Test with local running multi-node test. Using this document can successfully start multi-node night e2e in locall - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: leo-pony --- .../contribution/multi_node_test.md | 62 ++++++++++++++++++- .../developer_guide/contribution/testing.md | 2 + 2 files changed, 63 insertions(+), 1 deletion(-) diff --git a/docs/source/developer_guide/contribution/multi_node_test.md b/docs/source/developer_guide/contribution/multi_node_test.md index a57a19c6..c5d1ecbc 100644 --- a/docs/source/developer_guide/contribution/multi_node_test.md +++ b/docs/source/developer_guide/contribution/multi_node_test.md @@ -66,7 +66,67 @@ From the workflow perspective, we can see how the final test script is executed, # fill with accuracy test kwargs ``` -3. Add the case to nightly workflow +3. Running Locally(Option) + + Step 1. Add cluster_hosts to config yamls + + Modify on every cluster host, commands as following: + like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml) after the configure item `num_nodes` , for example: + `cluster_hosts: ["xxx.xxx.xxx.188", "xxx.xxx.xxx.212"]` + + Step 2. Install develop environment + - Install vllm-ascend develop packages on every cluster host + + ``` bash + cd /vllm-workspace/vllm-ascend + python3 -m pip install -r requirements-dev.txt + ``` + + - Install AISBench on the first host(leader node) in cluster_hosts + + ``` bash + export AIS_BENCH_TAG="v3.0-20250930-master" + export AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git" + + git clone -b ${AIS_BENCH_TAG} --depth 1 ${AIS_BENCH_URL} /vllm-workspace/vllm-ascend/benchmark + cd /vllm-workspace/vllm-ascend/benchmark + pip install -e . -r requirements/api.txt -r requirements/extra.txt + ``` + + Step 3. Running test locally + - Export environments + + On leader host(the first node xxx.xxx.xxx.188) + + ``` bash + export LWS_WORKER_INDEX=0 + export WORKSPACE=/vllm-workspace + export CONFIG_YAML_PATH=DeepSeek-V3.yaml + export FAIL_TAG=FAIL_TAG + ``` + + On slave host(other nodes, such as xxx.xxx.xxx.212) + + ``` bash + export LWS_WORKER_INDEX=1 + export WORKSPACE=/vllm-workspace + export CONFIG_YAML_PATH=DeepSeek-V3.yaml + export FAIL_TAG=FAIL_TAG + ``` + + `LWS_WORKER_INDEX` is the index of this node in the `cluster_hosts` . The node with an index of 0 is the leader. + Slave node index value range is [1, num_nodes-1]. + - Run vllm serve instances + + Copy and Run run.sh on every cluster host, to start vllm, commands as following: + + ``` bash + cp /vllm-workspace/vllm-ascend/tests/e2e/nightly/multi_node/scripts/run.sh /vllm-workspace/ + cd /vllm-workspace/ + bash -x run.sh + ``` + +4. Add the case to nightly workflow currently, the multi-node test workflow defined in the [vllm_ascend_test_nightly_a2/a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a3.yaml) ```yaml diff --git a/docs/source/developer_guide/contribution/testing.md b/docs/source/developer_guide/contribution/testing.md index 20206979..df710af3 100644 --- a/docs/source/developer_guide/contribution/testing.md +++ b/docs/source/developer_guide/contribution/testing.md @@ -246,6 +246,8 @@ VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.p This will reproduce the E2E test. See [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml). +Run nightly multi-node test cases locally refer to section of `Running Locally` of [Multi Node Test](./multi_node_test.md). + #### E2E test example: - Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)