[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780)

### What this PR does / why we need it?
As support for the mooncake connector is now available, the llmdatadist
connector is no longer being maintained, so the llmdatadist-related
files need to be retired.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By ci

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
This commit is contained in:
wangxiaoteng888
2025-12-09 22:36:43 +08:00
committed by GitHub
parent 848419d1ba
commit a77045f355
19 changed files with 188 additions and 1819 deletions

View File

@@ -104,7 +104,7 @@ vllm-ascend is a hardware plugin for vLLM. Basically, the version of vllm-ascend
### 8. Does vllm-ascend support Prefill Disaggregation feature?
Yes, vllm-ascend supports Prefill Disaggregation feature with LLMdatadist, Mooncake backend. Take [official tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_pd_disaggregation_llmdatadist.html) for example.
Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake backend. Take [official tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_pd_disaggregation_mooncake.html) for example.
### 9. Does vllm-ascend support quantization method?

View File

@@ -9,7 +9,7 @@ single_npu_qwen2_audio
single_npu_qwen3_embedding
single_npu_qwen3_quantization
single_npu_qwen3_w4a4
single_node_pd_disaggregation_llmdatadist
single_node_pd_disaggregation_mooncake
multi_npu_qwen3_next
multi_npu
multi_npu_moge

View File

@@ -1,4 +1,4 @@
# Prefill-Decode Disaggregation Llmdatadist Verification (Qwen2.5-VL)
# Prefill-Decode Disaggregation Mooncake Verification (Qwen2.5-VL)
## Getting Start
@@ -69,10 +69,8 @@ export HCCL_IF_IP=192.0.0.1 # node ip
export GLOO_SOCKET_IFNAME="eth0" # network card name
export TP_SOCKET_IFNAME="eth0"
export HCCL_SOCKET_IFNAME="eth0"
export DISAGGREGATED_PREFILL_RANK_TABLE_PATH="/path/to/your/generated/ranktable.json"
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=10
export VLLM_ASCEND_LLMDD_RPC_PORT=5959
vllm serve /model/Qwen2.5-VL-7B-Instruct \
--host 0.0.0.0 \
@@ -85,14 +83,22 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
--max-num-batched-tokens 40000 \
--trust-remote-code \
--gpu-memory-utilization 0.9 \
--kv-transfer-config \
'{"kv_connector": "LLMDataDistCMgrConnector",
"kv_buffer_device": "npu",
"kv_role": "kv_producer",
"kv_parallel_size": 1,
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
--kv-transfer-config \
'{"kv_connector": "MooncakeConnector",
"kv_role": "kv_producer",
"kv_port": "30000",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
"kv_connector_extra_config": {
"prefill": {
"dp_size": 1,
"tp_size": 1
},
"decode": {
"dp_size": 1,
"tp_size": 1
}
}
}'
```
@@ -106,10 +112,8 @@ export HCCL_IF_IP=192.0.0.1 # node ip
export GLOO_SOCKET_IFNAME="eth0" # network card name
export TP_SOCKET_IFNAME="eth0"
export HCCL_SOCKET_IFNAME="eth0"
export DISAGGREGATED_PREFILL_RANK_TABLE_PATH="/path/to/your/generated/ranktable.json"
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=10
export VLLM_ASCEND_LLMDD_RPC_PORT=5979
vllm serve /model/Qwen2.5-VL-7B-Instruct \
--host 0.0.0.0 \
@@ -122,14 +126,22 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
--max-num-batched-tokens 40000 \
--trust-remote-code \
--gpu-memory-utilization 0.9 \
--kv-transfer-config \
'{"kv_connector": "LLMDataDistCMgrConnector",
"kv_buffer_device": "npu",
--kv-transfer-config \
'{"kv_connector": "MooncakeConnector",
"kv_role": "kv_consumer",
"kv_parallel_size": 1,
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
"kv_port": "30100",
"engine_id": "1",
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
"kv_connector_extra_config": {
"prefill": {
"dp_size": 1,
"tp_size": 1
},
"decode": {
"dp_size": 1,
"tp_size": 1
}
}
}'
```
@@ -137,7 +149,7 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct \
:::::
If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES, VLLM_ASCEND_LLMDD_RPC_PORT and port to different values for each P process.
If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES and port to different values for each P process.
## Example Proxy for Deployment