xc-llm-ascend/docs/source/user_guide/feature_guide/lmcache_ascend_deployment.md

# LMCache-Ascend Deployment Guide

## Overview

LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.

We provide a simple deployment guide here. For further info about deployment notes, please refer to [LMCache-Ascend doc](https://github.com/LMCache/LMCache-Ascend/blob/main/README.md)

## Getting Started

### Clone LMCache-Ascend Repo

Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.

```bash
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
```

### Docker

```bash
cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .
```

Once that is built, run it with the following cmd

```bash
DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
    --privileged \
    --cap-add=SYS_RESOURCE \
    --cap-add=IPC_LOCK \
    -p 8000:8000 \
    -p 8001:8001 \
    --name lmcache-ascend-dev \
    -e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
    -e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
    -e ASCEND_TOTAL_MEMORY_GB=32 \
    -e VLLM_TARGET_DEVICE=npu \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /etc/localtime:/etc/localtime \
    -v /var/log/npu:/var/log/npu \
    -v /dev/davinci_manager:/dev/davinci_manager \
    -v /dev/devmm_svm:/dev/devmm_svm \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /etc/hccn.conf:/etc/hccn.conf \
    lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \
    /bin/bash
```

### Manual Installation

Assuming your working directory is ```/workspace``` and vllm/vllm-ascend have already been installed.

1. Install LMCache Repo

```bash
NO_CUDA_EXT=1 pip install lmcache==0.3.12
```

2. Install LMCache-Ascend Repo

```bash
cd /workspace/LMCache-Ascend
python3 -m pip install -v --no-build-isolation -e .
```

### Usage

We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.

#### Online serving

```bash
python \
    -m vllm.entrypoints.openai.api_server \
    --port 8100 \
    --model /data/models/Qwen/Qwen3-32B \
    --trust-remote-code \
    --disable-log-requests \
    --block-size 128 \
    --kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
```

#### Offline

```python
ktc = KVTransferConfig(
        kv_connector="LMCacheAscendConnector",
        kv_role="kv_both"
    )
```
[feat] add LMCacheAscendConnector (#6882) ### What this PR does / why we need it? LMCache-Ascend is LMCache's solution on the Ascend platform and one of the KVCache pooling solutions for Ascend. We hope to integrate LMCache-Ascend into the vLLM-Ascend community as one of the official KVCache pooling solutions for vLLM-Ascend. We added a new LMCacheAscendConnector in vLLM-Ascend and registered it. ### Does this PR introduce _any_ user-facing change? Users can specify the kvconnector using `--kv-transfer-config`, allowing them to freely choose which kvconnector to use, without any user-facing change. ### How was this patch tested? Test by specifying `--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'` - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: chloroethylene <jjysama@gmail.com> 2026-03-13 17:41:35 +08:00			`# LMCache-Ascend Deployment Guide`

			`## Overview`

			`LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.`

			`We provide a simple deployment guide here. For further info about deployment notes, please refer to [LMCache-Ascend doc](https://github.com/LMCache/LMCache-Ascend/blob/main/README.md)`

			`## Getting Started`

			`### Clone LMCache-Ascend Repo`

			`Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.`

			```bash
			`cd /workspace`
			`git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git`
			```

			`### Docker`

			```bash
			`cd /workspace/LMCache-Ascend`
			`docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .`
			```

			`Once that is built, run it with the following cmd`

			```bash
			`DEVICE_LIST="0,1,2,3,4,5,6,7"`
			`docker run -it \`
			`--privileged \`
			`--cap-add=SYS_RESOURCE \`
			`--cap-add=IPC_LOCK \`
			`-p 8000:8000 \`
			`-p 8001:8001 \`
			`--name lmcache-ascend-dev \`
			`-e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \`
			`-e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \`
			`-e ASCEND_TOTAL_MEMORY_GB=32 \`
			`-e VLLM_TARGET_DEVICE=npu \`
			`-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \`
			`-v /etc/localtime:/etc/localtime \`
			`-v /var/log/npu:/var/log/npu \`
			`-v /dev/davinci_manager:/dev/davinci_manager \`
			`-v /dev/devmm_svm:/dev/devmm_svm \`
			`-v /etc/ascend_install.info:/etc/ascend_install.info \`
			`-v /etc/hccn.conf:/etc/hccn.conf \`
			`lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \`
			`/bin/bash`
			```

			`### Manual Installation`

			Assuming your working directory is ```/workspace``` and vllm/vllm-ascend have already been installed.

			`1. Install LMCache Repo`

			```bash
			`NO_CUDA_EXT=1 pip install lmcache==0.3.12`
			```

			`2. Install LMCache-Ascend Repo`

			```bash
			`cd /workspace/LMCache-Ascend`
			`python3 -m pip install -v --no-build-isolation -e .`
			```

			`### Usage`

			`We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.`

			`#### Online serving`

			```bash
			`python \`
			`-m vllm.entrypoints.openai.api_server \`
			`--port 8100 \`
			`--model /data/models/Qwen/Qwen3-32B \`
			`--trust-remote-code \`
			`--disable-log-requests \`
			`--block-size 128 \`
			`--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`
			```

			`#### Offline`

			```python
			`ktc = KVTransferConfig(`
			`kv_connector="LMCacheAscendConnector",`
			`kv_role="kv_both"`
			`)`
			```