### What this PR does / why we need it?
LMCache-Ascend is LMCache's solution on the Ascend platform and one of
the KVCache pooling solutions for Ascend. We hope to integrate
LMCache-Ascend into the vLLM-Ascend community as one of the official
KVCache pooling solutions for vLLM-Ascend.
We added a new LMCacheAscendConnector in vLLM-Ascend and registered it.
### Does this PR introduce _any_ user-facing change?
Users can specify the kvconnector using `--kv-transfer-config`, allowing
them to freely choose which kvconnector to use, without any user-facing
change.
### How was this patch tested?
Test by specifying `--kv-transfer-config
'{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
---------
Signed-off-by: chloroethylene <jjysama@gmail.com>
2.5 KiB
2.5 KiB
LMCache-Ascend Deployment Guide
Overview
LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.
We provide a simple deployment guide here. For further info about deployment notes, please refer to LMCache-Ascend doc
Getting Started
Clone LMCache-Ascend Repo
Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
Docker
cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .
Once that is built, run it with the following cmd
DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
--privileged \
--cap-add=SYS_RESOURCE \
--cap-add=IPC_LOCK \
-p 8000:8000 \
-p 8001:8001 \
--name lmcache-ascend-dev \
-e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_TOTAL_MEMORY_GB=32 \
-e VLLM_TARGET_DEVICE=npu \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/localtime:/etc/localtime \
-v /var/log/npu:/var/log/npu \
-v /dev/davinci_manager:/dev/davinci_manager \
-v /dev/devmm_svm:/dev/devmm_svm \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /etc/hccn.conf:/etc/hccn.conf \
lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \
/bin/bash
Manual Installation
Assuming your working directory is /workspace and vllm/vllm-ascend have already been installed.
- Install LMCache Repo
NO_CUDA_EXT=1 pip install lmcache==0.3.12
- Install LMCache-Ascend Repo
cd /workspace/LMCache-Ascend
python3 -m pip install -v --no-build-isolation -e .
Usage
We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.
Online serving
python \
-m vllm.entrypoints.openai.api_server \
--port 8100 \
--model /data/models/Qwen/Qwen3-32B \
--trust-remote-code \
--disable-log-requests \
--block-size 128 \
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
Offline
ktc = KVTransferConfig(
kv_connector="LMCacheAscendConnector",
kv_role="kv_both"
)