[DOC] add request forwarding (#6780)
### What this PR does / why we need it?
- New section: "Request Forwarding" documentation in
docs/source/tutorials/models/DeepSeek-V3.2.md
- Environment fix: Changed VLLM_ASCEND_ENABLE_FLASHCOMM1 from 0 to 1 in
the DeepSeek-V3 configuration examples
### Does this PR introduce _any_ user-facing change?
Documentation update only - provides new configuration guidance for
request forwarding setups
### How was this patch tested?
- vLLM version: v0.15.0
- vLLM main:
9562912cea
---------
Signed-off-by: guozr <guozr1997@hotmail.com>
Co-authored-by: guozr <guozr1997@hotmail.com>
This commit is contained in:
@@ -297,7 +297,7 @@ export VLLM_USE_V1=1
|
|||||||
export HCCL_BUFFSIZE=200
|
export HCCL_BUFFSIZE=200
|
||||||
export VLLM_ASCEND_ENABLE_MLAPO=1
|
export VLLM_ASCEND_ENABLE_MLAPO=1
|
||||||
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
||||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
|
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||||
export HCCL_CONNECT_TIMEOUT=120
|
export HCCL_CONNECT_TIMEOUT=120
|
||||||
export HCCL_INTRA_PCIE_ENABLE=1
|
export HCCL_INTRA_PCIE_ENABLE=1
|
||||||
export HCCL_INTRA_ROCE_ENABLE=0
|
export HCCL_INTRA_ROCE_ENABLE=0
|
||||||
@@ -350,7 +350,7 @@ export VLLM_USE_V1=1
|
|||||||
export HCCL_BUFFSIZE=200
|
export HCCL_BUFFSIZE=200
|
||||||
export VLLM_ASCEND_ENABLE_MLAPO=1
|
export VLLM_ASCEND_ENABLE_MLAPO=1
|
||||||
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
|
||||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
|
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||||
export HCCL_CONNECT_TIMEOUT=120
|
export HCCL_CONNECT_TIMEOUT=120
|
||||||
export HCCL_INTRA_PCIE_ENABLE=1
|
export HCCL_INTRA_PCIE_ENABLE=1
|
||||||
export HCCL_INTRA_ROCE_ENABLE=0
|
export HCCL_INTRA_ROCE_ENABLE=0
|
||||||
@@ -830,6 +830,37 @@ python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-s
|
|||||||
python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-start 4 --dp-address 141.61.39.117 --dp-rpc-port 12777 --vllm-start-port 9100
|
python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-start 4 --dp-address 141.61.39.117 --dp-rpc-port 12777 --vllm-start-port 9100
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Request Forwarding
|
||||||
|
|
||||||
|
To set up request forwarding, run the following script on any machine :download:`load_balance_proxy_server_example.py <examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py>`
|
||||||
|
|
||||||
|
```shell
|
||||||
|
unset http_proxy
|
||||||
|
unset https_proxy
|
||||||
|
|
||||||
|
python load_balance_proxy_server_example.py \
|
||||||
|
--port 8000 \
|
||||||
|
--host 0.0.0.0 \
|
||||||
|
--prefiller-hosts \
|
||||||
|
141.61.39.105 \
|
||||||
|
141.61.39.113 \
|
||||||
|
--prefiller-ports \
|
||||||
|
9100 \
|
||||||
|
9100 \
|
||||||
|
--decoder-hosts \
|
||||||
|
141.61.39.117 \
|
||||||
|
141.61.39.117 \
|
||||||
|
141.61.39.117 \
|
||||||
|
141.61.39.117 \
|
||||||
|
141.61.39.181 \
|
||||||
|
141.61.39.181 \
|
||||||
|
141.61.39.181 \
|
||||||
|
141.61.39.181 \
|
||||||
|
--decoder-ports \
|
||||||
|
9100 9101 9102 9103 \
|
||||||
|
9100 9101 9102 9103 \
|
||||||
|
```
|
||||||
|
|
||||||
## Functional Verification
|
## Functional Verification
|
||||||
|
|
||||||
Once your server is started, you can query the model with input prompts:
|
Once your server is started, you can query the model with input prompts:
|
||||||
|
|||||||
Reference in New Issue
Block a user