Files
xc-llm-ascend/vllm_ascend/distributed/__init__.py

45 lines
1.6 KiB
Python
Raw Normal View History

#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from vllm.distributed.kv_transfer.kv_connector.factory import \
KVConnectorFactory
def register_connector():
KVConnectorFactory.register_connector(
"MooncakeConnectorV1", "vllm_ascend.distributed.mooncake_connector",
"MooncakeConnector")
KVConnectorFactory.register_connector(
"MooncakeConnectorStoreV1",
"vllm_ascend.distributed.kvpool.ascend_store_connector",
"AscendStoreConnector")
KVConnectorFactory.register_connector(
"AscendStoreConnector",
"vllm_ascend.distributed.kvpool.ascend_store_connector",
"AscendStoreConnector")
KVConnectorFactory.register_connector(
"MooncakeLayerwiseConnector",
"vllm_ascend.distributed.mooncake_layerwise_connector",
"MooncakeLayerwiseConnector")
[Core][Worker] Add UCMConnector for KV Cache Offloading (#4411) ### What this PR does / why we need it? This PR introduces the initial integration of **UCM (Unified Cache Management)** into the vllm-ascend distributed KV-cache system. Specifically, it adds: - A new `UCMConnector` implementation under the distributed KV-transfer framework. - Support for offloading KV-cache blocks to external UCM backends (DRAM / NFS / Localdisk), depending on UCM configuration). - Integration with vLLM V1 KV connector interface, including metadata handling and role registration. **Why it is needed:** - UCM provides a unified, high-performance storage layer for KV-cache externalization. - This enables vllm-ascend to support out-of-core KV-cache workloads, improve memory efficiency, and leverage hardware-accelerated storage paths (RDMA / NFS / hybrid modes). - This connector is a required component to allow future work on multi-node inference + UCM-based scaling. --- ### Does this PR introduce _any_ user-facing change? Yes, but limited: - A new `kv_connector=UCMConnector` option becomes available through the configuration interface. - When selected, vllm-ascend workers may initialize UCM and offload KV-cache blocks externally. - No default behaviors are changed. Users must explicitly enable this connector. This PR does **not** modify: - existing APIs, - default execution paths, - model runner behavior, - user workflow unless `UCMConnector` is configured. --- ### How was this patch tested? --- ### Prefix Caching Benchmark We provide preliminary measurements for TTFT (ms) under VLLM benchmark. Tests run on 2 * Ascend 910B3, vllm-ascend 0.11.0, Tensor Parallel size 2, with UCM (Localdisk) enabled. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: UnifiedCacheManager <unifiedcachem@163.com>
2025-12-16 10:53:30 +08:00
KVConnectorFactory.register_connector(
"UCMConnector", "vllm_ascend.distributed.ucm_connector",
"UCMConnectorV1")