[Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265)

### What this PR does / why we need it?
This PR adds a load-balance dp proxy server which can be used in
external DP scenario without Disaggregated-Prefill enabled. What's more,
add a doc of external dp and load-balance dp proxy server.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
See the new doc.

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
whx
2025-11-21 16:33:23 +08:00
committed by GitHub
parent 6c157cb75a
commit a5554b6661
6 changed files with 514 additions and 18 deletions

View File

@@ -1,4 +1,4 @@
Here is an example guiding how to use `launch_online_dp.py` to launch external dp server in vllm. User can easily launch external dp server following the steps below:
Here is an example guiding how to use `launch_online_dp.py` to launch external dp vllm servers. User can easily launch external dp servers following the steps below:
### Modify parameters in `run_dp_template.sh`
`run_dp_template.sh` is an template script used to launch each dp vllm instance separately. It will be called by `launch_online_dp.py` in multi threads and most of its configurations are set by `launch_online_dp.py`. Parameters you need to set manually include:
@@ -36,3 +36,19 @@ python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-s
python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-start 2 --dp-address x.x.x.x --dp-rpc-port 12342
```
### (Optional) Run `dp_load_balance_proxy_server.py` to load balance requests between external dp servers
External dp server means that you need to handle load balance between multiple dp instances out of vllm by implementing your custom proxy server. Here we provide an example of request-length-aware dp load-balance proxy server for you. The arguments of `dp_load_balance_proxy_server.py` include:
1. `--port`: port of proxy server, default 8000
2. `--host`: host address of proxy server, default localhost
3. `--dp-hosts`: host addresses of external dp servers
4. `--dp-ports`: ports of external dp servers, the number of dp ports should be the same as dp hosts.
5. `--max-retries`: Max number of retries for HTTP requests, default 3
For example, if you have two external dp servers running in x.x.x.a:10001 and x.x.x.b:10002, then you can start the proxy server by:
```(python)
python dp_load_balance_proxy_server.py --host x.x.x.c --port 8000 --dp-hosts x.x.x.a x.x.x.b --dp-ports 10001 10002
```
which will then serve as the entrypoint for inference requests at x.x.x.c:8000, and load balance coming requests between these two external dp servers according to request length.