xc-llm-ascend/vllm_ascend/distributed/kv_transfer/ascend_multi_connector.py

from typing import TYPE_CHECKING

from vllm.distributed.kv_transfer.kv_connector.v1.multi_connector import MultiConnector

from vllm_ascend.distributed.kv_transfer.kv_p2p.mooncake_layerwise_connector import MooncakeLayerwiseConnector

if TYPE_CHECKING:
    from vllm.v1.core.kv_cache_manager import KVCacheBlocks
    from vllm.v1.request import Request


class AscendMultiConnector(MultiConnector):
    def update_state_after_alloc(self, request: "Request", blocks: "KVCacheBlocks", num_external_tokens: int):
        chosen_connector = self._requests_to_connector.get(request.request_id, -1)
        empty_blocks = blocks.new_empty()
        for i, c in enumerate(self._connectors):
            if i == chosen_connector or isinstance(c, MooncakeLayerwiseConnector):
                # Forward call to the chosen connector (if any).
                c.update_state_after_alloc(request, blocks, num_external_tokens)
            else:
                # Call with empty blocks for other connectors.
                c.update_state_after_alloc(request, empty_blocks, 0)
[P/D][KVPool]Mooncake Layerwise Connector supports kv_pool (#7032) ### What this PR does / why we need it? This PR creates and registers `ascend_multi_connector`, which allows the `mooncake_layerwise_connector` to use the kv_pooling feature. We unregister the original vllm's `MultiConnector` and replace it with `AscendMultiConnector` when registering the connectors. ### Does this PR introduce _any_ user-facing change? No. User can use `MultiConnector` to initialize `AscendMultiConnector`. ### How was this patch tested? By CI. - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> 2026-03-09 10:49:04 +08:00			`from typing import TYPE_CHECKING`

			`from vllm.distributed.kv_transfer.kv_connector.v1.multi_connector import MultiConnector`

			`from vllm_ascend.distributed.kv_transfer.kv_p2p.mooncake_layerwise_connector import MooncakeLayerwiseConnector`

			`if TYPE_CHECKING:`
			`from vllm.v1.core.kv_cache_manager import KVCacheBlocks`
			`from vllm.v1.request import Request`


			`class AscendMultiConnector(MultiConnector):`
			`def update_state_after_alloc(self, request: "Request", blocks: "KVCacheBlocks", num_external_tokens: int):`
			`chosen_connector = self._requests_to_connector.get(request.request_id, -1)`
			`empty_blocks = blocks.new_empty()`
			`for i, c in enumerate(self._connectors):`
			`if i == chosen_connector or isinstance(c, MooncakeLayerwiseConnector):`
			`# Forward call to the chosen connector (if any).`
			`c.update_state_after_alloc(request, blocks, num_external_tokens)`
			`else:`
			`# Call with empty blocks for other connectors.`
			`c.update_state_after_alloc(request, empty_blocks, 0)`