xc-llm-ascend/docs/source/developer_guide/contribution/multi_node_test.md

# Multi Node Test

Multi-Node CI is designed to test distributed scenarios of very large models, eg: disaggregated_prefill multi DP across multi nodes and so on.

## How is works

The following picture shows the basic deployment view of the multi-node CI mechanism, It shows how the github action interact with [lws](https://lws.sigs.k8s.io/docs/overview/) (a kind of kubernetes crd resource)

![alt text](../../assets/deployment.png)

From the workflow perspective, we can see how the final test script is executed, The key point is that these two [lws.yaml and run.sh](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/scripts), The former defines how our k8s cluster is pulled up, and the latter defines the entry script when the pod is started, Each node executes different logic according to the [LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference/labels-annotations-and-environment-variables/) environment variable, so that multiple nodes can form a distributed cluster to perform tasks.

![alt text](../../assets/workflow.png)

## How to contribute

1. Upload custom weights

   If you need customized weights, for example, you quantized a w8a8 weight for DeepSeek-V3 and you want your weight to run on CI, Uploading weights to ModelScope's [vllm-ascend](https://www.modelscope.cn/organization/vllm-ascend) organization is welcome, If you do not have permission to upload, please contact @Potabk

2. Add config yaml

    As the entrypoint script [run.sh](https://github.com/vllm-project/vllm-ascend/blob/0bf3f21a987aede366ec4629ad0ffec8e32fe90d/tests/e2e/nightly/multi_node/scripts/run.sh#L106) shows, A k8s pod startup means traversing all *.yaml files in the [directory](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/config/models), reading and executing according to different configurations, so what we need to do is just add "yamls" like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml).

    Suppose you have **2 nodes** running a 1P1D setup (1 Prefillers + 1 Decoder):

    you may add a config file looks like:

    ```yaml
    test_name: "test DeepSeek-V3 disaggregated_prefill"
    # the model being tested
    model: "vllm-ascend/DeepSeek-V3-W8A8"
    # how large the cluster is
    num_nodes: 2
    npu_per_node: 16
    # All env vars you need should add it here
    env_common:
    VLLM_USE_MODELSCOPE: true
    OMP_PROC_BIND: false
    OMP_NUM_THREADS: 100
    HCCL_BUFFSIZE: 1024
    SERVER_PORT: 8080
    disaggregated_prefill:
    enabled: true
    # node index(a list) which meet all the conditions:
    #  - prefiller
    #  - no headless(have api server)
    prefiller_host_index: [0]
    # node index(a list) which meet all the conditions:
    #  - decoder
    #  - no headless(have api server)
    decoder_host_index: [1]

    # Add each node's vllm serve cli command just like you run locally
    deployment:
    -
        server_cmd: >
            vllm serve ...
    -
        server_cmd: >
            vllm serve ...
    benchmarks:
    perf:
        # fill with performance test kwargs
    acc:
        # fill with accuracy test kwargs
    ```
  
3. Running Locally(Option)

    Step 1. Add cluster_hosts to config yamls

    Modify on every cluster host, commands as following:
    like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml) after the configure item `num_nodes` , for example:
    `cluster_hosts: ["xxx.xxx.xxx.188", "xxx.xxx.xxx.212"]`

    Step 2. Install develop environment
    - Install vllm-ascend develop packages on every cluster host

        ``` bash
        cd /vllm-workspace/vllm-ascend
        python3 -m pip install -r requirements-dev.txt
        ```

    - Install AISBench on the first host(leader node) in cluster_hosts

        ``` bash
        export AIS_BENCH_TAG="v3.0-20250930-master"
        export AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git"

        git clone -b ${AIS_BENCH_TAG} --depth 1 ${AIS_BENCH_URL} /vllm-workspace/vllm-ascend/benchmark
        cd /vllm-workspace/vllm-ascend/benchmark
        pip install -e . -r requirements/api.txt -r requirements/extra.txt
        ```

    Step 3. Running test locally
    - Export environments

        On leader host(the first node xxx.xxx.xxx.188)

        ``` bash
        export LWS_WORKER_INDEX=0
        export WORKSPACE=/vllm-workspace
        export CONFIG_YAML_PATH=DeepSeek-V3.yaml
        export FAIL_TAG=FAIL_TAG
        ```

        On slave host(other nodes, such as xxx.xxx.xxx.212)

        ``` bash
        export LWS_WORKER_INDEX=1
        export WORKSPACE=/vllm-workspace
        export CONFIG_YAML_PATH=DeepSeek-V3.yaml
        export FAIL_TAG=FAIL_TAG
        ```

        `LWS_WORKER_INDEX` is the index of this node in the `cluster_hosts` . The node with an index of 0 is the leader.
        Slave node index value range is [1, num_nodes-1].
    - Run vllm serve instances

        Copy and Run run.sh on every cluster host, to start vllm, commands as following:

        ``` bash
        cp /vllm-workspace/vllm-ascend/tests/e2e/nightly/multi_node/scripts/run.sh /vllm-workspace/
        cd /vllm-workspace/
        bash -x run.sh
        ```

4. Add the case to nightly workflow
currently, the multi-node test workflow defined in the [vllm_ascend_test_nightly_a2/a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a3.yaml)

   ```yaml
    multi-node-tests:
        needs: single-node-tests
        if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
        strategy:
        fail-fast: false
        max-parallel: 1
        matrix:
            test_config:
            - name: multi-node-deepseek-pd
                config_file_path: tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml
                size: 2
            - name: multi-node-qwen3-dp
                config_file_path: tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A22B.yaml
                size: 2
            - name: multi-node-dpsk-4node-pd
                config_file_path: tests/e2e/nightly/multi_node/config/models/DeepSeek-R1-W8A8.yaml
                size: 4
        uses: ./.github/workflows/_e2e_nightly_multi_node.yaml
        with:
        soc_version: a3
        image: m.daocloud.io/quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
        replicas: 1
        size: ${{ matrix.test_config.size }}
        config_file_path: ${{ matrix.test_config.config_file_path }}
   ```
  
The matrix above defines all the parameters required to add a multi-machine use case, The parameters worth paying attention to (I mean if you are adding a new use case) are size and the path to the yaml configuration file. The former defines the number of nodes required for your use case, and the latter defines the path to the configuration file you have completed in step 2.
[CI][Doc] Optimize multi-node CI (#3565) ### What this PR does / why we need it? This pull request mainly do the following things: 1. Add a doc for multi-node CI, The main content is the mechanism principle and how to contribute 2. Simplify the config yaml for more developer-friendly 3. Optimized the mooncake installation script to prevent accidental failures during installation 4. Fix the workflow to ensure the kubernetes can be apply correctly 5. Add Qwen3-235B-W8A8 disaggregated_prefill test 6. Add GLM-4.5 multi dp test 7. Add 2p1d 4nodes disaggregated_prefill test 8. Refactor nightly tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/17c540a993af88204ad1b78345c8a865cf58ce44 --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-10-25 09:23:47 +08:00			`# Multi Node Test`

			`Multi-Node CI is designed to test distributed scenarios of very large models, eg: disaggregated_prefill multi DP across multi nodes and so on.`

			`## How is works`

			`The following picture shows the basic deployment view of the multi-node CI mechanism, It shows how the github action interact with [lws](https://lws.sigs.k8s.io/docs/overview/) (a kind of kubernetes crd resource)`

			`![alt text](../../assets/deployment.png)`

			From the workflow perspective, we can see how the final test script is executed, The key point is that these two [lws.yaml and run.sh](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/scripts), The former defines how our k8s cluster is pulled up, and the latter defines the entry script when the pod is started, Each node executes different logic according to the [LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference/labels-annotations-and-environment-variables/) environment variable, so that multiple nodes can form a distributed cluster to perform tasks.

			`![alt text](../../assets/workflow.png)`

			`## How to contribute`

			`1. Upload custom weights`

			`If you need customized weights, for example, you quantized a w8a8 weight for DeepSeek-V3 and you want your weight to run on CI, Uploading weights to ModelScope's [vllm-ascend](https://www.modelscope.cn/organization/vllm-ascend) organization is welcome, If you do not have permission to upload, please contact @Potabk`

			`2. Add config yaml`

			As the entrypoint script [run.sh](https://github.com/vllm-project/vllm-ascend/blob/0bf3f21a987aede366ec4629ad0ffec8e32fe90d/tests/e2e/nightly/multi_node/scripts/run.sh#L106) shows, A k8s pod startup means traversing all *.yaml files in the [directory](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/config/models), reading and executing according to different configurations, so what we need to do is just add "yamls" like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml).

			`Suppose you have 2 nodes running a 1P1D setup (1 Prefillers + 1 Decoder):`

			`you may add a config file looks like:`

			```yaml
			`test_name: "test DeepSeek-V3 disaggregated_prefill"`
			`# the model being tested`
			`model: "vllm-ascend/DeepSeek-V3-W8A8"`
			`# how large the cluster is`
			`num_nodes: 2`
			`npu_per_node: 16`
			`# All env vars you need should add it here`
			`env_common:`
			`VLLM_USE_MODELSCOPE: true`
			`OMP_PROC_BIND: false`
			`OMP_NUM_THREADS: 100`
			`HCCL_BUFFSIZE: 1024`
			`SERVER_PORT: 8080`
			`disaggregated_prefill:`
			`enabled: true`
			`# node index(a list) which meet all the conditions:`
			`# - prefiller`
			`# - no headless(have api server)`
			`prefiller_host_index: [0]`
			`# node index(a list) which meet all the conditions:`
			`# - decoder`
			`# - no headless(have api server)`
			`decoder_host_index: [1]`

[Info][main] Corrected the errors in the information (#4055) ### What this PR does / why we need it? Corrected the errors in the information ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac Signed-off-by: lilinsiman <lilinsiman@gmail.com> 2025-11-08 18:48:59 +08:00			`# Add each node's vllm serve cli command just like you run locally`
[CI][Doc] Optimize multi-node CI (#3565) ### What this PR does / why we need it? This pull request mainly do the following things: 1. Add a doc for multi-node CI, The main content is the mechanism principle and how to contribute 2. Simplify the config yaml for more developer-friendly 3. Optimized the mooncake installation script to prevent accidental failures during installation 4. Fix the workflow to ensure the kubernetes can be apply correctly 5. Add Qwen3-235B-W8A8 disaggregated_prefill test 6. Add GLM-4.5 multi dp test 7. Add 2p1d 4nodes disaggregated_prefill test 8. Refactor nightly tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/17c540a993af88204ad1b78345c8a865cf58ce44 --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-10-25 09:23:47 +08:00			`deployment:`
			`-`
			`server_cmd: >`
			`vllm serve ...`
			`-`
			`server_cmd: >`
			`vllm serve ...`
			`benchmarks:`
			`perf:`
			`# fill with performance test kwargs`
			`acc:`
			`# fill with accuracy test kwargs`
			```

[Doc] Add local running multi-node nightly test case guide (#4884) ### What this PR does / why we need it? Add local running multi-node nightly test case guide, help running locally at developer env. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Test with local running multi-node test. Using this document can successfully start multi-node night e2e in locall - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-12-11 08:56:27 +08:00			`3. Running Locally(Option)`

			`Step 1. Add cluster_hosts to config yamls`

			`Modify on every cluster host, commands as following:`
			like [DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml) after the configure item `num_nodes` , for example:
			`cluster_hosts: ["xxx.xxx.xxx.188", "xxx.xxx.xxx.212"]`

			`Step 2. Install develop environment`
			`- Install vllm-ascend develop packages on every cluster host`

			``` bash
			`cd /vllm-workspace/vllm-ascend`
			`python3 -m pip install -r requirements-dev.txt`
			```

			`- Install AISBench on the first host(leader node) in cluster_hosts`

			``` bash
			`export AIS_BENCH_TAG="v3.0-20250930-master"`
			`export AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git"`

			`git clone -b ${AIS_BENCH_TAG} --depth 1 ${AIS_BENCH_URL} /vllm-workspace/vllm-ascend/benchmark`
			`cd /vllm-workspace/vllm-ascend/benchmark`
			`pip install -e . -r requirements/api.txt -r requirements/extra.txt`
			```

			`Step 3. Running test locally`
			`- Export environments`

			`On leader host(the first node xxx.xxx.xxx.188)`

			``` bash
			`export LWS_WORKER_INDEX=0`
			`export WORKSPACE=/vllm-workspace`
			`export CONFIG_YAML_PATH=DeepSeek-V3.yaml`
			`export FAIL_TAG=FAIL_TAG`
			```

			`On slave host(other nodes, such as xxx.xxx.xxx.212)`

			``` bash
			`export LWS_WORKER_INDEX=1`
			`export WORKSPACE=/vllm-workspace`
			`export CONFIG_YAML_PATH=DeepSeek-V3.yaml`
			`export FAIL_TAG=FAIL_TAG`
			```

			`LWS_WORKER_INDEX` is the index of this node in the `cluster_hosts` . The node with an index of 0 is the leader.
			`Slave node index value range is [1, num_nodes-1].`
			`- Run vllm serve instances`

			`Copy and Run run.sh on every cluster host, to start vllm, commands as following:`

			``` bash
			`cp /vllm-workspace/vllm-ascend/tests/e2e/nightly/multi_node/scripts/run.sh /vllm-workspace/`
			`cd /vllm-workspace/`
			`bash -x run.sh`
			```

			`4. Add the case to nightly workflow`
[CI][Doc] Optimize multi-node CI (#3565) ### What this PR does / why we need it? This pull request mainly do the following things: 1. Add a doc for multi-node CI, The main content is the mechanism principle and how to contribute 2. Simplify the config yaml for more developer-friendly 3. Optimized the mooncake installation script to prevent accidental failures during installation 4. Fix the workflow to ensure the kubernetes can be apply correctly 5. Add Qwen3-235B-W8A8 disaggregated_prefill test 6. Add GLM-4.5 multi dp test 7. Add 2p1d 4nodes disaggregated_prefill test 8. Refactor nightly tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/17c540a993af88204ad1b78345c8a865cf58ce44 --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-10-25 09:23:47 +08:00			`currently, the multi-node test workflow defined in the [vllm_ascend_test_nightly_a2/a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test_nightly_a3.yaml)`

			```yaml
			`multi-node-tests:`
			`needs: single-node-tests`
			`if: always() && (github.event_name == 'schedule' \|\| github.event_name == 'workflow_dispatch')`
			`strategy:`
			`fail-fast: false`
			`max-parallel: 1`
			`matrix:`
			`test_config:`
			`- name: multi-node-deepseek-pd`
			`config_file_path: tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml`
			`size: 2`
			`- name: multi-node-qwen3-dp`
[E2E] Optimize nightly testcase. (#4886) ### What this PR does / why we need it? Optimize nightly testcase. Changes: - tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A3B.yaml: Add accuracy and performance benchmark - tests/e2e/models/configs/Qwen3-8B-Base.yaml: Delete - tests/e2e/models/configs/internlm-7b.yaml: Change to internlm3-8b-instruct - tests/e2e/nightly/models/test_deepseek_r1_w8a8_eplb.py: Change to DeepSeek-R1-0528-W8A8 model - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: menogrey <1299267905@qq.com> 2025-12-11 10:15:39 +08:00			`config_file_path: tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A22B.yaml`
[CI][Doc] Optimize multi-node CI (#3565) ### What this PR does / why we need it? This pull request mainly do the following things: 1. Add a doc for multi-node CI, The main content is the mechanism principle and how to contribute 2. Simplify the config yaml for more developer-friendly 3. Optimized the mooncake installation script to prevent accidental failures during installation 4. Fix the workflow to ensure the kubernetes can be apply correctly 5. Add Qwen3-235B-W8A8 disaggregated_prefill test 6. Add GLM-4.5 multi dp test 7. Add 2p1d 4nodes disaggregated_prefill test 8. Refactor nightly tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/17c540a993af88204ad1b78345c8a865cf58ce44 --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-10-25 09:23:47 +08:00			`size: 2`
			`- name: multi-node-dpsk-4node-pd`
			`config_file_path: tests/e2e/nightly/multi_node/config/models/DeepSeek-R1-W8A8.yaml`
			`size: 4`
			`uses: ./.github/workflows/_e2e_nightly_multi_node.yaml`
			`with:`
			`soc_version: a3`
[main]Upgrade cann to 8.3rc2 (#4350) ### What this PR does / why we need it? Upgrade cann to 8.3rc2 ### Does this PR introduce _any_ user-facing change? Yes, docker image will use 8.3.RC2 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: MrZ20 <2609716663@qq.com> 2025-11-28 14:06:01 +08:00			`image: m.daocloud.io/quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11`
[CI][Doc] Optimize multi-node CI (#3565) ### What this PR does / why we need it? This pull request mainly do the following things: 1. Add a doc for multi-node CI, The main content is the mechanism principle and how to contribute 2. Simplify the config yaml for more developer-friendly 3. Optimized the mooncake installation script to prevent accidental failures during installation 4. Fix the workflow to ensure the kubernetes can be apply correctly 5. Add Qwen3-235B-W8A8 disaggregated_prefill test 6. Add GLM-4.5 multi dp test 7. Add 2p1d 4nodes disaggregated_prefill test 8. Refactor nightly tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/17c540a993af88204ad1b78345c8a865cf58ce44 --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2025-10-25 09:23:47 +08:00			`replicas: 1`
			`size: ${{ matrix.test_config.size }}`
			`config_file_path: ${{ matrix.test_config.config_file_path }}`
			```

			`The matrix above defines all the parameters required to add a multi-machine use case, The parameters worth paying attention to (I mean if you are adding a new use case) are size and the path to the yaml configuration file. The former defines the number of nodes required for your use case, and the latter defines the path to the configuration file you have completed in step 2.`