### What this PR does / why we need it?
This PR introduces the initial integration of **UCM (Unified Cache
Management)** into the vllm-ascend distributed KV-cache system.
Specifically, it adds:
- A new `UCMConnector` implementation under the distributed KV-transfer
framework.
- Support for offloading KV-cache blocks to external UCM backends (DRAM
/ NFS / Localdisk), depending on UCM configuration).
- Integration with vLLM V1 KV connector interface, including metadata
handling and role registration.
**Why it is needed:**
- UCM provides a unified, high-performance storage layer for KV-cache
externalization.
- This enables vllm-ascend to support out-of-core KV-cache workloads,
improve memory efficiency, and leverage hardware-accelerated storage
paths (RDMA / NFS / hybrid modes).
- This connector is a required component to allow future work on
multi-node inference + UCM-based scaling.
---
### Does this PR introduce _any_ user-facing change?
Yes, but limited:
- A new `kv_connector=UCMConnector` option becomes available through the
configuration interface.
- When selected, vllm-ascend workers may initialize UCM and offload
KV-cache blocks externally.
- No default behaviors are changed. Users must explicitly enable this
connector.
This PR does **not** modify:
- existing APIs,
- default execution paths,
- model runner behavior,
- user workflow unless `UCMConnector` is configured.
---
### How was this patch tested?
---
### Prefix Caching Benchmark
We provide preliminary measurements for TTFT (ms) under VLLM benchmark.
Tests run on 2 * Ascend 910B3, vllm-ascend 0.11.0, Tensor Parallel size
2, with UCM (Localdisk) enabled.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: UnifiedCacheManager <unifiedcachem@163.com>
45 lines
1.6 KiB
Python
45 lines
1.6 KiB
Python
#
|
|
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
|
|
# This file is a part of the vllm-ascend project.
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
# you may not use this file except in compliance with the License.
|
|
# You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
# See the License for the specific language governing permissions and
|
|
# limitations under the License.
|
|
#
|
|
|
|
from vllm.distributed.kv_transfer.kv_connector.factory import \
|
|
KVConnectorFactory
|
|
|
|
|
|
def register_connector():
|
|
KVConnectorFactory.register_connector(
|
|
"MooncakeConnectorV1", "vllm_ascend.distributed.mooncake_connector",
|
|
"MooncakeConnector")
|
|
|
|
KVConnectorFactory.register_connector(
|
|
"MooncakeConnectorStoreV1",
|
|
"vllm_ascend.distributed.kvpool.ascend_store_connector",
|
|
"AscendStoreConnector")
|
|
|
|
KVConnectorFactory.register_connector(
|
|
"AscendStoreConnector",
|
|
"vllm_ascend.distributed.kvpool.ascend_store_connector",
|
|
"AscendStoreConnector")
|
|
|
|
KVConnectorFactory.register_connector(
|
|
"MooncakeLayerwiseConnector",
|
|
"vllm_ascend.distributed.mooncake_layerwise_connector",
|
|
"MooncakeLayerwiseConnector")
|
|
|
|
KVConnectorFactory.register_connector(
|
|
"UCMConnector", "vllm_ascend.distributed.ucm_connector",
|
|
"UCMConnectorV1")
|