[Refactor]Refactor of vllm_ascend/distributed module (#5719)

### What this PR does / why we need it?
Based on the RFC:https://github.com/vllm-project/vllm-ascend/issues/5604

This PR is a refactoring of vllm_ascend/distributed, moving all
kv_transfer realtaed codes into a dedicated folder, which has already
been done in vLLM

### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?


- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: lty <linhebiwen@gmail.com>
This commit is contained in:
lty
2026-01-15 08:57:40 +08:00
committed by GitHub
parent f34b3b8ee9
commit 295018ec0f
56 changed files with 300 additions and 293 deletions

View File

@@ -137,7 +137,7 @@ spec:
- "--trust-remote-code"
- "--enforce-eager"
- "--kv-transfer-config"
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_producer","kv_parallel_size":1,"kv_port":"20001","engine_id":"0","kv_rank":0,"kv_connector_module_path":"vllm_ascend.distributed.mooncake_connector","kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_producer","kv_parallel_size":1,"kv_port":"20001","engine_id":"0","kv_rank":0,"kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
imagePullPolicy: Always
resources:
limits:
@@ -240,7 +240,7 @@ spec:
- "--no-enable-prefix-caching"
- "--enforce-eager"
- "--kv-transfer-config"
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_consumer","kv_parallel_size":1,"kv_port":"20002","engine_id":"1","kv_rank":1,"kv_connector_module_path":"vllm_ascend.distributed.mooncake_connector","kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
- '{"kv_connector":"MooncakeConnectorV1","kv_buffer_device":"npu","kv_role":"kv_consumer","kv_parallel_size":1,"kv_port":"20002","engine_id":"1","kv_rank":1,"kv_connector_extra_config":{"prefill":{"dp_size":2,"tp_size":2},"decode":{"dp_size":2,"tp_size":2}}}'
imagePullPolicy: Always
resources:
limits:

View File

@@ -163,8 +163,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
"kv_role": "kv_producer",
"kv_parallel_size": "1",
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
"engine_id": "0"
}'
--additional-config '{"enable_weight_nz_layout":true,"enable_prefill_optimizations":true}'
```
@@ -230,8 +229,7 @@ vllm serve vllm-ascend/DeepSeek-R1-W8A8 \
"kv_role": "kv_consumer",
"kv_parallel_size": "1",
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
"engine_id": "0"
}' \
--additional-config '{"enable_weight_nz_layout":true}'
```
@@ -435,8 +433,7 @@ In the PD separation scenario, we provide a optimized configuration.
"kv_role": "kv_producer",
"kv_parallel_size": "1",
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
"engine_id": "0"
}'
```
@@ -458,8 +455,7 @@ In the PD separation scenario, we provide a optimized configuration.
"kv_role": "kv_consumer",
"kv_parallel_size": "1",
"kv_port": "20001",
"engine_id": "0",
"kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector"
"engine_id": "0"
}'
```