[feat] add LMCacheAscendConnector (#6882)

### What this PR does / why we need it?

LMCache-Ascend is LMCache's solution on the Ascend platform and one of
the KVCache pooling solutions for Ascend. We hope to integrate
LMCache-Ascend into the vLLM-Ascend community as one of the official
KVCache pooling solutions for vLLM-Ascend.

We added a new LMCacheAscendConnector in vLLM-Ascend and registered it.

### Does this PR introduce _any_ user-facing change?

Users can specify the kvconnector using `--kv-transfer-config`, allowing
them to freely choose which kvconnector to use, without any user-facing
change.

### How was this patch tested?

Test by specifying `--kv-transfer-config
'{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: chloroethylene <jjysama@gmail.com>
This commit is contained in:
Junyuan
2026-03-13 17:41:35 +08:00
committed by GitHub
parent 986cd45397
commit 6852a2e267
5 changed files with 109 additions and 0 deletions

View File

@@ -28,4 +28,5 @@ npugraph_ex
weight_prefetch
sequence_parallelism
batch_invariance
lmcache_ascend_deployment
:::

View File

@@ -0,0 +1,94 @@
# LMCache-Ascend Deployment Guide
## Overview
LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.
We provide a simple deployment guide here. For further info about deployment notes, please refer to [LMCache-Ascend doc](https://github.com/LMCache/LMCache-Ascend/blob/main/README.md)
## Getting Started
### Clone LMCache-Ascend Repo
Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.
```bash
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
```
### Docker
```bash
cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .
```
Once that is built, run it with the following cmd
```bash
DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
--privileged \
--cap-add=SYS_RESOURCE \
--cap-add=IPC_LOCK \
-p 8000:8000 \
-p 8001:8001 \
--name lmcache-ascend-dev \
-e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_TOTAL_MEMORY_GB=32 \
-e VLLM_TARGET_DEVICE=npu \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/localtime:/etc/localtime \
-v /var/log/npu:/var/log/npu \
-v /dev/davinci_manager:/dev/davinci_manager \
-v /dev/devmm_svm:/dev/devmm_svm \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /etc/hccn.conf:/etc/hccn.conf \
lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \
/bin/bash
```
### Manual Installation
Assuming your working directory is ```/workspace``` and vllm/vllm-ascend have already been installed.
1. Install LMCache Repo
```bash
NO_CUDA_EXT=1 pip install lmcache==0.3.12
```
2. Install LMCache-Ascend Repo
```bash
cd /workspace/LMCache-Ascend
python3 -m pip install -v --no-build-isolation -e .
```
### Usage
We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.
#### Online serving
```bash
python \
-m vllm.entrypoints.openai.api_server \
--port 8100 \
--model /data/models/Qwen/Qwen3-32B \
--trust-remote-code \
--disable-log-requests \
--block-size 128 \
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
```
#### Offline
```python
ktc = KVTransferConfig(
kv_connector="LMCacheAscendConnector",
kv_role="kv_both"
)
```

View File

@@ -37,3 +37,6 @@ ignore_missing_imports = True
[mypy-msmodelslim.*]
ignore_missing_imports = True
[mypy-lmcache_ascend.*]
ignore_missing_imports = True

View File

@@ -51,3 +51,9 @@ def register_connector():
KVConnectorFactory.register_connector(
"UCMConnector", "vllm_ascend.distributed.kv_transfer.kv_pool.ucm_connector", "UCMConnectorV1"
)
KVConnectorFactory.register_connector(
"LMCacheAscendConnector",
"vllm_ascend.distributed.kv_transfer.kv_pool.lmcache_ascend_connector",
"LMCacheConnectorV1",
)

View File

@@ -0,0 +1,5 @@
# SPDX-License-Identifier: Apache-2.0
import lmcache_ascend # noqa: F401
from vllm.distributed.kv_transfer.kv_connector.v1.lmcache_connector import LMCacheConnectorV1
__all__ = ["LMCacheConnectorV1"]