[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780)
### What this PR does / why we need it?
As support for the mooncake connector is now available, the llmdatadist
connector is no longer being maintained, so the llmdatadist-related
files need to be retired.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
This commit is contained in:
@@ -104,7 +104,7 @@ vllm-ascend is a hardware plugin for vLLM. Basically, the version of vllm-ascend
|
||||
|
||||
### 8. Does vllm-ascend support Prefill Disaggregation feature?
|
||||
|
||||
Yes, vllm-ascend supports Prefill Disaggregation feature with LLMdatadist, Mooncake backend. Take [official tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_pd_disaggregation_llmdatadist.html) for example.
|
||||
Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake backend. Take [official tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_pd_disaggregation_mooncake.html) for example.
|
||||
|
||||
### 9. Does vllm-ascend support quantization method?
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ single_npu_qwen2_audio
|
||||
single_npu_qwen3_embedding
|
||||
single_npu_qwen3_quantization
|
||||
single_npu_qwen3_w4a4
|
||||
single_node_pd_disaggregation_llmdatadist
|
||||
single_node_pd_disaggregation_mooncake
|
||||
multi_npu_qwen3_next
|
||||
multi_npu
|
||||
multi_npu_moge
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Prefill-Decode Disaggregation Llmdatadist Verification (Qwen2.5-VL)
|
||||
# Prefill-Decode Disaggregation Mooncake Verification (Qwen2.5-VL)
|
||||
|
||||
## Getting Start
|
||||
|
||||
@@ -69,10 +69,8 @@ export HCCL_IF_IP=192.0.0.1 # node ip
|
||||
export GLOO_SOCKET_IFNAME="eth0" # network card name
|
||||
export TP_SOCKET_IFNAME="eth0"
|
||||
export HCCL_SOCKET_IFNAME="eth0"
|
||||
export DISAGGREGATED_PREFILL_RANK_TABLE_PATH="/path/to/your/generated/ranktable.json"
|
||||
export OMP_PROC_BIND=false
|
||||
export OMP_NUM_THREADS=10
|
||||
export VLLM_ASCEND_LLMDD_RPC_PORT=5959
|
||||
|
||||
vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
||||
--host 0.0.0.0 \
|
||||
@@ -85,14 +83,22 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
||||
--max-num-batched-tokens 40000 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.9 \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "LLMDataDistCMgrConnector",
|
||||
"kv_buffer_device": "npu",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_parallel_size": 1,
|
||||
"kv_port": "20001",
|
||||
"engine_id": "0",
|
||||
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnector",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_port": "30000",
|
||||
"engine_id": "0",
|
||||
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
|
||||
"kv_connector_extra_config": {
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 1
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 1
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -106,10 +112,8 @@ export HCCL_IF_IP=192.0.0.1 # node ip
|
||||
export GLOO_SOCKET_IFNAME="eth0" # network card name
|
||||
export TP_SOCKET_IFNAME="eth0"
|
||||
export HCCL_SOCKET_IFNAME="eth0"
|
||||
export DISAGGREGATED_PREFILL_RANK_TABLE_PATH="/path/to/your/generated/ranktable.json"
|
||||
export OMP_PROC_BIND=false
|
||||
export OMP_NUM_THREADS=10
|
||||
export VLLM_ASCEND_LLMDD_RPC_PORT=5979
|
||||
|
||||
vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
||||
--host 0.0.0.0 \
|
||||
@@ -122,14 +126,22 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
||||
--max-num-batched-tokens 40000 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.9 \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "LLMDataDistCMgrConnector",
|
||||
"kv_buffer_device": "npu",
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnector",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_parallel_size": 1,
|
||||
"kv_port": "20001",
|
||||
"engine_id": "0",
|
||||
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
|
||||
"kv_port": "30100",
|
||||
"engine_id": "1",
|
||||
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
|
||||
"kv_connector_extra_config": {
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 1
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 1
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -137,7 +149,7 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
||||
|
||||
:::::
|
||||
|
||||
If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES, VLLM_ASCEND_LLMDD_RPC_PORT and port to different values for each P process.
|
||||
If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES and port to different values for each P process.
|
||||
|
||||
## Example Proxy for Deployment
|
||||
|
||||
Reference in New Issue
Block a user