[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780)

### What this PR does / why we need it? As support for the mooncake connector is now available, the llmdatadist connector is no longer being maintained, so the llmdatadist-related files need to be retired. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.12.0 - vLLM main: ad32e3e19c --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>
2025-12-09 22:36:43 +08:00
parent 848419d1ba
commit a77045f355
19 changed files with 188 additions and 1819 deletions
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -104,7 +104,7 @@ vllm-ascend is a hardware plugin for vLLM. Basically, the version of vllm-ascend

 ### 8. Does vllm-ascend support Prefill Disaggregation feature?

-Yes, vllm-ascend supports Prefill Disaggregation feature with LLMdatadist, Mooncake backend. Take [official tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_pd_disaggregation_llmdatadist.html) for example.
+Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake backend. Take [official tutorial](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node_pd_disaggregation_mooncake.html) for example.

 ### 9. Does vllm-ascend support quantization method?

--- a/docs/source/tutorials/index.md
+++ b/docs/source/tutorials/index.md
@@ -9,7 +9,7 @@ single_npu_qwen2_audio
 single_npu_qwen3_embedding
 single_npu_qwen3_quantization
 single_npu_qwen3_w4a4
-single_node_pd_disaggregation_llmdatadist
+single_node_pd_disaggregation_mooncake
 multi_npu_qwen3_next
 multi_npu
 multi_npu_moge
--- a/docs/source/tutorials/single_node_pd_disaggregation_llmdatadist.md
+++ b/docs/source/tutorials/single_node_pd_disaggregation_llmdatadist.md
@@ -1,4 +1,4 @@
-# Prefill-Decode Disaggregation Llmdatadist Verification (Qwen2.5-VL)
+# Prefill-Decode Disaggregation Mooncake Verification (Qwen2.5-VL)

 ## Getting Start

@@ -69,10 +69,8 @@ export HCCL_IF_IP=192.0.0.1 # node ip
 export GLOO_SOCKET_IFNAME="eth0"  # network card name
 export TP_SOCKET_IFNAME="eth0"
 export HCCL_SOCKET_IFNAME="eth0"
-export DISAGGREGATED_PREFILL_RANK_TABLE_PATH="/path/to/your/generated/ranktable.json"
 export OMP_PROC_BIND=false
 export OMP_NUM_THREADS=10
-export VLLM_ASCEND_LLMDD_RPC_PORT=5959

 vllm serve /model/Qwen2.5-VL-7B-Instruct  \
  --host 0.0.0.0 \
@@ -85,14 +83,22 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct  \
  --max-num-batched-tokens 40000  \
  --trust-remote-code \
  --gpu-memory-utilization 0.9  \
-  --kv-transfer-config  \
-  '{"kv_connector": "LLMDataDistCMgrConnector",
-    "kv_buffer_device": "npu",
-    "kv_role": "kv_producer",
-    "kv_parallel_size": 1,
-    "kv_port": "20001",
-    "engine_id": "0",
-    "kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
+  --kv-transfer-config \
+  '{"kv_connector": "MooncakeConnector",
+  "kv_role": "kv_producer",
+  "kv_port": "30000",
+  "engine_id": "0",
+  "kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
+  "kv_connector_extra_config": {
+            "prefill": {
+                    "dp_size": 1,
+                    "tp_size": 1
+             },
+             "decode": {
+                    "dp_size": 1,
+                    "tp_size": 1
+             }
+      }
  }'
 ```

@@ -106,10 +112,8 @@ export HCCL_IF_IP=192.0.0.1  # node ip
 export GLOO_SOCKET_IFNAME="eth0"  # network card name
 export TP_SOCKET_IFNAME="eth0"
 export HCCL_SOCKET_IFNAME="eth0"
-export DISAGGREGATED_PREFILL_RANK_TABLE_PATH="/path/to/your/generated/ranktable.json"
 export OMP_PROC_BIND=false
 export OMP_NUM_THREADS=10
-export VLLM_ASCEND_LLMDD_RPC_PORT=5979

 vllm serve /model/Qwen2.5-VL-7B-Instruct  \
  --host 0.0.0.0 \
@@ -122,14 +126,22 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct  \
  --max-num-batched-tokens 40000  \
  --trust-remote-code \
  --gpu-memory-utilization 0.9  \
-  --kv-transfer-config  \
-  '{"kv_connector": "LLMDataDistCMgrConnector",
-  "kv_buffer_device": "npu",
+  --kv-transfer-config \
+  '{"kv_connector": "MooncakeConnector",
  "kv_role": "kv_consumer",
-  "kv_parallel_size": 1,
-  "kv_port": "20001",
-  "engine_id": "0",
-  "kv_connector_module_path": "vllm_ascend.distributed.llmdatadist_c_mgr_connector"
+  "kv_port": "30100",
+  "engine_id": "1",
+  "kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector",
+  "kv_connector_extra_config": {
+            "prefill": {
+                    "dp_size": 1,
+                    "tp_size": 1
+             },
+             "decode": {
+                    "dp_size": 1,
+                    "tp_size": 1
+             }
+      }
  }'
 ```

@@ -137,7 +149,7 @@ vllm serve /model/Qwen2.5-VL-7B-Instruct  \

 :::::

-If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES, VLLM_ASCEND_LLMDD_RPC_PORT and port to different values for each P process.
+If you want to run "2P1D", please set ASCEND_RT_VISIBLE_DEVICES and port to different values for each P process.

 ## Example Proxy for Deployment