xc-llm-ascend/examples/external_online_dp/README.md

Here is an example guiding how to use `launch_online_dp.py` to launch external dp vllm servers. User can easily launch external dp servers following the steps below:

### Modify parameters in `run_dp_template.sh`

`run_dp_template.sh` is an template script used to launch each dp vllm instance separately. It will be called by `launch_online_dp.py` in multi threads and most of its configurations are set by `launch_online_dp.py`. Parameters you need to set manually include:

1. The IP and socket_ifname of your machine. If running on multi-nodes, please make sure the scripts on each node has been set with correct IP and socket_ifname of that node.
2. vLLM serving related parameters including model_path and other configurations. Note that port, dp-related parameters and tp_size is set by `launch_online_dp.py`, all the other vLLM parameters in this file only serve as an example and you are free to modify them according to your purpose.

### Run `launch_online_dp.py` with CL arguments

All the arguments that can be set by users are:

1. `--dp-size`: global data parallel size, must be set
2. `--tp-size`: tensor parallel size, default 1
3. `--dp-size-local`: local data parallel size, defaultly set to `dp_size`
4. `--dp-rank-start`: Starting rank for data parallel, default 0
5. `--dp-address`: IP address of data parallel master node
6. `--dp-rpc-port`: Port of data parallel master node, default 12345
7. `--vllm-start-port`: Starting port of vLLM serving instances, default 9000

An example of running external DP in one single node:

```(python)
cd examples/external_online_dp
# running DP4 TP4 in a node with 16 NPUs
python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 4 --dp-rank-start 0 --dp-address x.x.x.x --dp-rpc-port 12342
```

An example of running external DP in two nodes:

```(python)
cd examples/external_online_dp
# running DP4 TP4 in two nodes with 8 NPUs each

# On node 0:
python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-start 0 --dp-address x.x.x.x --dp-rpc-port 12342

# On node 1:
python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-start 2 --dp-address x.x.x.x --dp-rpc-port 12342
```

### (Optional) Run `dp_load_balance_proxy_server.py` to load balance requests between external dp servers

External dp server means that you need to handle load balance between multiple dp instances out of vllm by implementing your custom proxy server. Here we provide an example of request-length-aware dp load-balance proxy server for you. The arguments of `dp_load_balance_proxy_server.py` include:

1. `--port`: port of proxy server, default 8000
2. `--host`: host address of proxy server, default localhost
3. `--dp-hosts`: host addresses of external dp servers
4. `--dp-ports`: ports of external dp servers, the number of dp ports should be the same as dp hosts.
5. `--max-retries`: Max number of retries for HTTP requests, default 3

For example, if you have two external dp servers running in x.x.x.a:10001 and x.x.x.b:10002, then you can start the proxy server by:

```(python)
python dp_load_balance_proxy_server.py --host x.x.x.c --port 8000 --dp-hosts x.x.x.a x.x.x.b --dp-ports 10001 10002
```

which will then serve as the entrypoint for inference requests at x.x.x.c:8000, and load balance coming requests between these two external dp servers according to request length.
[Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265) ### What this PR does / why we need it? This PR adds a load-balance dp proxy server which can be used in external DP scenario without Disaggregated-Prefill enabled. What's more, add a doc of external dp and load-balance dp proxy server. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? See the new doc. - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-11-21 16:33:23 +08:00			Here is an example guiding how to use `launch_online_dp.py` to launch external dp vllm servers. User can easily launch external dp servers following the steps below:
[DP] External dp server starter (#2685) This PR re-implements external-dp starter based on vllm's support for external dp. - vLLM version: v0.10.1.1 - vLLM main: https://github.com/vllm-project/vllm/commit/f38035c123b32f239f746585e197e7250694a1ca --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-09-03 16:30:26 +08:00
			### Modify parameters in `run_dp_template.sh`
[Lint]Style: reformat markdown files via markdownlint (#5884) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain> 2026-01-15 09:06:01 +08:00
[DP] External dp server starter (#2685) This PR re-implements external-dp starter based on vllm's support for external dp. - vLLM version: v0.10.1.1 - vLLM main: https://github.com/vllm-project/vllm/commit/f38035c123b32f239f746585e197e7250694a1ca --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-09-03 16:30:26 +08:00			`run_dp_template.sh` is an template script used to launch each dp vllm instance separately. It will be called by `launch_online_dp.py` in multi threads and most of its configurations are set by `launch_online_dp.py`. Parameters you need to set manually include:

			`1. The IP and socket_ifname of your machine. If running on multi-nodes, please make sure the scripts on each node has been set with correct IP and socket_ifname of that node.`
[CI][lint] Add rule `codespell` back (#6236) ### What this PR does / why we need it? After removing codepsell a while, we discovered that typo had a problem correctly recognizing certain misspelled words, so I suggested adding it back. - vLLM version: v0.14.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d68209402ddab3f54a09bc1f4de9a9495a283b60 --------- Signed-off-by: wangli <wangli858794774@gmail.com> 2026-01-26 14:12:33 +08:00			2. vLLM serving related parameters including model_path and other configurations. Note that port, dp-related parameters and tp_size is set by `launch_online_dp.py`, all the other vLLM parameters in this file only serve as an example and you are free to modify them according to your purpose.
[DP] External dp server starter (#2685) This PR re-implements external-dp starter based on vllm's support for external dp. - vLLM version: v0.10.1.1 - vLLM main: https://github.com/vllm-project/vllm/commit/f38035c123b32f239f746585e197e7250694a1ca --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-09-03 16:30:26 +08:00
			### Run `launch_online_dp.py` with CL arguments
[Lint]Style: reformat markdown files via markdownlint (#5884) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain> 2026-01-15 09:06:01 +08:00
[DP] External dp server starter (#2685) This PR re-implements external-dp starter based on vllm's support for external dp. - vLLM version: v0.10.1.1 - vLLM main: https://github.com/vllm-project/vllm/commit/f38035c123b32f239f746585e197e7250694a1ca --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-09-03 16:30:26 +08:00			`All the arguments that can be set by users are:`

			1. `--dp-size`: global data parallel size, must be set
			2. `--tp-size`: tensor parallel size, default 1
			3. `--dp-size-local`: local data parallel size, defaultly set to `dp_size`
			4. `--dp-rank-start`: Starting rank for data parallel, default 0
			5. `--dp-address`: IP address of data parallel master node
			6. `--dp-rpc-port`: Port of data parallel master node, default 12345
			7. `--vllm-start-port`: Starting port of vLLM serving instances, default 9000

			`An example of running external DP in one single node:`
[Lint]Style: Convert `example` to `ruff format` (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the `example/` to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain> 2026-01-13 20:46:50 +08:00
[DP] External dp server starter (#2685) This PR re-implements external-dp starter based on vllm's support for external dp. - vLLM version: v0.10.1.1 - vLLM main: https://github.com/vllm-project/vllm/commit/f38035c123b32f239f746585e197e7250694a1ca --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-09-03 16:30:26 +08:00			```(python)
			`cd examples/external_online_dp`
			`# running DP4 TP4 in a node with 16 NPUs`
			`python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 4 --dp-rank-start 0 --dp-address x.x.x.x --dp-rpc-port 12342`
			```

			`An example of running external DP in two nodes:`
[Lint]Style: Convert `example` to `ruff format` (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the `example/` to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain> 2026-01-13 20:46:50 +08:00
[DP] External dp server starter (#2685) This PR re-implements external-dp starter based on vllm's support for external dp. - vLLM version: v0.10.1.1 - vLLM main: https://github.com/vllm-project/vllm/commit/f38035c123b32f239f746585e197e7250694a1ca --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-09-03 16:30:26 +08:00			```(python)
			`cd examples/external_online_dp`
			`# running DP4 TP4 in two nodes with 8 NPUs each`

			`# On node 0:`
			`python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-start 0 --dp-address x.x.x.x --dp-rpc-port 12342`

			`# On node 1:`
			`python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-start 2 --dp-address x.x.x.x --dp-rpc-port 12342`
			```

[Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265) ### What this PR does / why we need it? This PR adds a load-balance dp proxy server which can be used in external DP scenario without Disaggregated-Prefill enabled. What's more, add a doc of external dp and load-balance dp proxy server. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? See the new doc. - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-11-21 16:33:23 +08:00			### (Optional) Run `dp_load_balance_proxy_server.py` to load balance requests between external dp servers
[Lint]Style: reformat markdown files via markdownlint (#5884) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain> 2026-01-15 09:06:01 +08:00
[Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265) ### What this PR does / why we need it? This PR adds a load-balance dp proxy server which can be used in external DP scenario without Disaggregated-Prefill enabled. What's more, add a doc of external dp and load-balance dp proxy server. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? See the new doc. - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-11-21 16:33:23 +08:00			External dp server means that you need to handle load balance between multiple dp instances out of vllm by implementing your custom proxy server. Here we provide an example of request-length-aware dp load-balance proxy server for you. The arguments of `dp_load_balance_proxy_server.py` include:

			1. `--port`: port of proxy server, default 8000
			2. `--host`: host address of proxy server, default localhost
			3. `--dp-hosts`: host addresses of external dp servers
			4. `--dp-ports`: ports of external dp servers, the number of dp ports should be the same as dp hosts.
			5. `--max-retries`: Max number of retries for HTTP requests, default 3

			`For example, if you have two external dp servers running in x.x.x.a:10001 and x.x.x.b:10002, then you can start the proxy server by:`

			```(python)
			`python dp_load_balance_proxy_server.py --host x.x.x.c --port 8000 --dp-hosts x.x.x.a x.x.x.b --dp-ports 10001 10002`
			```

			`which will then serve as the entrypoint for inference requests at x.x.x.c:8000, and load balance coming requests between these two external dp servers according to request length.`