[Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265)
### What this PR does / why we need it?
This PR adds a load-balance dp proxy server which can be used in
external DP scenario without Disaggregated-Prefill enabled. What's more,
add a doc of external dp and load-balance dp proxy server.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
See the new doc.
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
---------
Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
@@ -2,7 +2,6 @@ export HCCL_IF_IP=your_ip_here
|
||||
export GLOO_SOCKET_IFNAME=your_socket_ifname_here
|
||||
export TP_SOCKET_IFNAME=your_socket_ifname_here
|
||||
export HCCL_SOCKET_IFNAME=your_socket_ifname_here
|
||||
export DISAGGREGATED_PREFILL_RANK_TABLE_PATH=your_rank_table_path_here
|
||||
export VLLM_LOGGING_LEVEL="info"
|
||||
export OMP_PROC_BIND=false
|
||||
export OMP_NUM_THREADS=10
|
||||
@@ -24,21 +23,10 @@ vllm serve model_path \
|
||||
--enable-expert-parallel \
|
||||
--seed 1024 \
|
||||
--served-model-name dsv3 \
|
||||
--max-model-len 3500 \
|
||||
--max-num-batched-tokens 3500 \
|
||||
--max-num-seqs 28 \
|
||||
--max-model-len 8192 \
|
||||
--max-num-batched-tokens 2048 \
|
||||
--max-num-seqs 16 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.9 \
|
||||
--quantization ascend \
|
||||
--speculative-config '{"num_speculative_tokens": 1, "method":"deepseek_mtp"}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "LLMDataDistCMgrConnector",
|
||||
"kv_buffer_device": "npu",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_parallel_size": "1",
|
||||
"kv_port": "20001",
|
||||
"engine_id": "0",
|
||||
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
|
||||
}' \
|
||||
--additional-config \
|
||||
'{"ascend_scheduler_config": {"enabled": true}, "torchair_graph_config":{"enabled":true,"enable_kv_nz":false, "graph_batch_size":[28]}, "enable_weight_nz_layout":true, "enable_multistream_moe":false}'
|
||||
--speculative-config '{"num_speculative_tokens": 1, "method":"deepseek_mtp"}' \
|
||||
Reference in New Issue
Block a user