[DOC] add request forwarding (#6780)

### What this PR does / why we need it? - New section: "Request Forwarding" documentation in docs/source/tutorials/models/DeepSeek-V3.2.md - Environment fix: Changed VLLM_ASCEND_ENABLE_FLASHCOMM1 from 0 to 1 in the DeepSeek-V3 configuration examples ### Does this PR introduce _any_ user-facing change? Documentation update only - provides new configuration guidance for request forwarding setups ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: 9562912cea --------- Signed-off-by: guozr <guozr1997@hotmail.com> Co-authored-by: guozr <guozr1997@hotmail.com>
2026-02-25 14:43:51 +08:00
parent ad9d9569ea
commit 2260af405f
1 changed files with 33 additions and 2 deletions
--- a/docs/source/tutorials/models/DeepSeek-V3.2.md
+++ b/docs/source/tutorials/models/DeepSeek-V3.2.md
@@ -297,7 +297,7 @@ export VLLM_USE_V1=1
 export HCCL_BUFFSIZE=200
 export VLLM_ASCEND_ENABLE_MLAPO=1
 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
-export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
+export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
 export HCCL_CONNECT_TIMEOUT=120
 export HCCL_INTRA_PCIE_ENABLE=1
 export HCCL_INTRA_ROCE_ENABLE=0
@@ -350,7 +350,7 @@ export VLLM_USE_V1=1
 export HCCL_BUFFSIZE=200
 export VLLM_ASCEND_ENABLE_MLAPO=1
 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
-export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
+export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
 export HCCL_CONNECT_TIMEOUT=120
 export HCCL_INTRA_PCIE_ENABLE=1
 export HCCL_INTRA_ROCE_ENABLE=0
@@ -830,6 +830,37 @@ python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-s
 python launch_online_dp.py --dp-size 8 --tp-size 4 --dp-size-local 4 --dp-rank-start 4 --dp-address 141.61.39.117 --dp-rpc-port 12777 --vllm-start-port 9100
 ```

+### Request Forwarding
+
+To set up request forwarding, run the following script on any machine :download:`load_balance_proxy_server_example.py <examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py>`
+
+```shell
+unset http_proxy
+unset https_proxy
+
+python load_balance_proxy_server_example.py \
+    --port 8000 \
+    --host 0.0.0.0 \
+    --prefiller-hosts \
+       141.61.39.105 \
+       141.61.39.113 \
+    --prefiller-ports \
+       9100 \
+       9100 \
+    --decoder-hosts \
+      141.61.39.117 \
+      141.61.39.117 \
+      141.61.39.117 \
+      141.61.39.117 \
+      141.61.39.181 \
+      141.61.39.181 \
+      141.61.39.181 \
+      141.61.39.181 \
+    --decoder-ports \
+      9100 9101 9102 9103 \
+      9100 9101 9102 9103 \
+```
+
 ## Functional Verification

 Once your server is started, you can query the model with input prompts: