【main】[Doc]add 2P1D instruction for single node (#4716)

### What this PR does / why we need it?
Add the description for 2P1D, keeping it consistent with the content in
the dev branch.

### Does this PR introduce _any_ user-facing change?
no


- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

Signed-off-by: mazhixin000 <mazhixinkorea@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
mazhixin000
2025-12-05 18:35:18 +08:00
committed by GitHub
parent 4b016b98a2
commit 3740b3edfc

View File

@@ -45,7 +45,7 @@ bash gen_ranktable.sh --ips 192.0.0.1 \
--npus-per-node 2 --network-card-name eth0 --prefill-device-cnt 1 --decode-device-cnt 1
```
The rank table will be generated at /vllm-workspace/vllm-ascend/examples/disaggregate_prefill_v1/ranktable.json
If you want to run "2P1D", please set npus-per-node to 3 and prefill-device-cnt to 2. The rank table will be generated at /vllm-workspace/vllm-ascend/examples/disaggregate_prefill_v1/ranktable.json
|Parameter | Meaning |
| --- | --- |
@@ -137,6 +137,8 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
:::::
If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES, VLLM_ASCEND_LLMDD_RPC_PORT and port to different values for each P process.
## Example Proxy for Deployment
Run a proxy server on the same node with the prefiller service instance. You can get the proxy program in the repository's examples: [load\_balance\_proxy\_server\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)
@@ -151,6 +153,12 @@ python load_balance_proxy_server_example.py \
--decoder-ports 13701
```
|Parameter | Meaning |
| --- | --- |
| --port | Port of proxy |
| --prefiller-port | All ports of prefill |
| --decoder-ports | All ports of decoder |
## Verification
Check service health using the proxy server endpoint.