[Doc] Add local running multi-node nightly test case guide (#4884)
### What this PR does / why we need it?
Add local running multi-node nightly test case guide, help running
locally at developer env.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
Test with local running multi-node test.
Using this document can successfully start multi-node night e2e in
locall
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
This commit is contained in:
@@ -66,7 +66,67 @@ From the workflow perspective, we can see how the final test script is executed,
|
|||||||
# fill with accuracy test kwargs
|
# fill with accuracy test kwargs
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Add the case to nightly workflow
|
3. Running Locally(Option)
|
||||||
|
|
||||||
|
Step 1. Add cluster_hosts to config yamls
|
||||||
|
|
||||||
|
Modify on every cluster host, commands as following:
|
||||||
|
like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml) after the configure item `num_nodes` , for example:
|
||||||
|
`cluster_hosts: ["xxx.xxx.xxx.188", "xxx.xxx.xxx.212"]`
|
||||||
|
|
||||||
|
Step 2. Install develop environment
|
||||||
|
- Install vllm-ascend develop packages on every cluster host
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
cd /vllm-workspace/vllm-ascend
|
||||||
|
python3 -m pip install -r requirements-dev.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
- Install AISBench on the first host(leader node) in cluster_hosts
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
export AIS_BENCH_TAG="v3.0-20250930-master"
|
||||||
|
export AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git"
|
||||||
|
|
||||||
|
git clone -b ${AIS_BENCH_TAG} --depth 1 ${AIS_BENCH_URL} /vllm-workspace/vllm-ascend/benchmark
|
||||||
|
cd /vllm-workspace/vllm-ascend/benchmark
|
||||||
|
pip install -e . -r requirements/api.txt -r requirements/extra.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
Step 3. Running test locally
|
||||||
|
- Export environments
|
||||||
|
|
||||||
|
On leader host(the first node xxx.xxx.xxx.188)
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
export LWS_WORKER_INDEX=0
|
||||||
|
export WORKSPACE=/vllm-workspace
|
||||||
|
export CONFIG_YAML_PATH=DeepSeek-V3.yaml
|
||||||
|
export FAIL_TAG=FAIL_TAG
|
||||||
|
```
|
||||||
|
|
||||||
|
On slave host(other nodes, such as xxx.xxx.xxx.212)
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
export LWS_WORKER_INDEX=1
|
||||||
|
export WORKSPACE=/vllm-workspace
|
||||||
|
export CONFIG_YAML_PATH=DeepSeek-V3.yaml
|
||||||
|
export FAIL_TAG=FAIL_TAG
|
||||||
|
```
|
||||||
|
|
||||||
|
`LWS_WORKER_INDEX` is the index of this node in the `cluster_hosts` . The node with an index of 0 is the leader.
|
||||||
|
Slave node index value range is [1, num_nodes-1].
|
||||||
|
- Run vllm serve instances
|
||||||
|
|
||||||
|
Copy and Run run.sh on every cluster host, to start vllm, commands as following:
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
cp /vllm-workspace/vllm-ascend/tests/e2e/nightly/multi_node/scripts/run.sh /vllm-workspace/
|
||||||
|
cd /vllm-workspace/
|
||||||
|
bash -x run.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Add the case to nightly workflow
|
||||||
currently, the multi-node test workflow defined in the [vllm_ascend_test_nightly_a2/a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a3.yaml)
|
currently, the multi-node test workflow defined in the [vllm_ascend_test_nightly_a2/a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a3.yaml)
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
|||||||
@@ -246,6 +246,8 @@ VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.p
|
|||||||
|
|
||||||
This will reproduce the E2E test. See [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).
|
This will reproduce the E2E test. See [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).
|
||||||
|
|
||||||
|
Run nightly multi-node test cases locally refer to section of `Running Locally` of [Multi Node Test](./multi_node_test.md).
|
||||||
|
|
||||||
#### E2E test example:
|
#### E2E test example:
|
||||||
|
|
||||||
- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
|
- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
|
||||||
|
|||||||
Reference in New Issue
Block a user