[DOC] add request forwarding (#6780)
### What this PR does / why we need it?
- New section: "Request Forwarding" documentation in
docs/source/tutorials/models/DeepSeek-V3.2.md
- Environment fix: Changed VLLM_ASCEND_ENABLE_FLASHCOMM1 from 0 to 1 in
the DeepSeek-V3 configuration examples
### Does this PR introduce _any_ user-facing change?
Documentation update only - provides new configuration guidance for
request forwarding setups
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
9562912cea
---------
Signed-off-by: guozr <guozr1997@hotmail.com>
Co-authored-by: guozr <guozr1997@hotmail.com>
This commit is contained in:
@@ -297,7 +297,7 @@ export VLLM_USE_V1=1
|
||||
export HCCL_BUFFSIZE=200
|
||||
export VLLM_ASCEND_ENABLE_MLAPO=1
|
||||
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||
export HCCL_CONNECT_TIMEOUT=120
|
||||
export HCCL_INTRA_PCIE_ENABLE=1
|
||||
export HCCL_INTRA_ROCE_ENABLE=0
|
||||
@@ -350,7 +350,7 @@ export VLLM_USE_V1=1
|
||||
export HCCL_BUFFSIZE=200
|
||||
export VLLM_ASCEND_ENABLE_MLAPO=1
|
||||
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||
export HCCL_CONNECT_TIMEOUT=120
|
||||
export HCCL_INTRA_PCIE_ENABLE=1
|
||||
export HCCL_INTRA_ROCE_ENABLE=0
|
||||
@@ -830,6 +830,37 @@ python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-s
|
||||
python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-start 4 --dp-address 141.61.39.117 --dp-rpc-port 12777 --vllm-start-port 9100
|
||||
```
|
||||
|
||||
### Request Forwarding
|
||||
|
||||
To set up request forwarding, run the following script on any machine :download:`load_balance_proxy_server_example.py <examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py>`
|
||||
|
||||
```shell
|
||||
unset http_proxy
|
||||
unset https_proxy
|
||||
|
||||
python load_balance_proxy_server_example.py \
|
||||
--port 8000 \
|
||||
--host 0.0.0.0 \
|
||||
--prefiller-hosts \
|
||||
141.61.39.105 \
|
||||
141.61.39.113 \
|
||||
--prefiller-ports \
|
||||
9100 \
|
||||
9100 \
|
||||
--decoder-hosts \
|
||||
141.61.39.117 \
|
||||
141.61.39.117 \
|
||||
141.61.39.117 \
|
||||
141.61.39.117 \
|
||||
141.61.39.181 \
|
||||
141.61.39.181 \
|
||||
141.61.39.181 \
|
||||
141.61.39.181 \
|
||||
--decoder-ports \
|
||||
9100 9101 9102 9103 \
|
||||
9100 9101 9102 9103 \
|
||||
```
|
||||
|
||||
## Functional Verification
|
||||
|
||||
Once your server is started, you can query the model with input prompts:
|
||||
|
||||
Reference in New Issue
Block a user