xc-llm-ascend/docs/source/user_guide/feature_guide/lmcache_ascend_deployment.md

# LMCache-Ascend Deployment Guide

## Overview

LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.

We provide a simple deployment guide here. For further info about deployment notes, please refer to [LMCache-Ascend doc](https://github.com/LMCache/LMCache-Ascend/blob/main/README.md)

## Getting Started

### Clone LMCache-Ascend Repo

Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.

```bash
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
```

### Docker

```bash
cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .
```

Once that is built, run it with the following cmd

```bash
DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
    --privileged \
    --cap-add=SYS_RESOURCE \
    --cap-add=IPC_LOCK \
    -p 8000:8000 \
    -p 8001:8001 \
    --name lmcache-ascend-dev \
    -e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
    -e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
    -e ASCEND_TOTAL_MEMORY_GB=32 \
    -e VLLM_TARGET_DEVICE=npu \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /etc/localtime:/etc/localtime \
    -v /var/log/npu:/var/log/npu \
    -v /dev/davinci_manager:/dev/davinci_manager \
    -v /dev/devmm_svm:/dev/devmm_svm \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /etc/hccn.conf:/etc/hccn.conf \
    lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \
    /bin/bash
```

### Manual Installation

Assuming your working directory is ```/workspace``` and vllm/vllm-ascend have already been installed.

1. Install LMCache Repo

    ```bash
    NO_CUDA_EXT=1 pip install lmcache==0.3.12
    ```

2. Install LMCache-Ascend Repo

    ```bash
    cd /workspace/LMCache-Ascend
    python3 -m pip install -v --no-build-isolation -e .
    ```

### Usage

We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.

#### Online serving

```bash
python \
    -m vllm.entrypoints.openai.api_server \
    --port 8100 \
    --model /data/models/Qwen/Qwen3-32B \
    --trust-remote-code \
    --disable-log-requests \
    --block-size 128 \
    --kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
```

#### Offline

```python
ktc = KVTransferConfig(
        kv_connector="LMCacheAscendConnector",
        kv_role="kv_both"
    )
```
[feat] add LMCacheAscendConnector (#6882) ### What this PR does / why we need it? LMCache-Ascend is LMCache's solution on the Ascend platform and one of the KVCache pooling solutions for Ascend. We hope to integrate LMCache-Ascend into the vLLM-Ascend community as one of the official KVCache pooling solutions for vLLM-Ascend. We added a new LMCacheAscendConnector in vLLM-Ascend and registered it. ### Does this PR introduce _any_ user-facing change? Users can specify the kvconnector using `--kv-transfer-config`, allowing them to freely choose which kvconnector to use, without any user-facing change. ### How was this patch tested? Test by specifying `--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'` - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: chloroethylene <jjysama@gmail.com> 2026-03-13 17:41:35 +08:00			`# LMCache-Ascend Deployment Guide`

			`## Overview`

			`LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.`

			`We provide a simple deployment guide here. For further info about deployment notes, please refer to [LMCache-Ascend doc](https://github.com/LMCache/LMCache-Ascend/blob/main/README.md)`

			`## Getting Started`

			`### Clone LMCache-Ascend Repo`

			`Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.`

			```bash
			`cd /workspace`
			`git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git`
			```

			`### Docker`

			```bash
			`cd /workspace/LMCache-Ascend`
			`docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .`
			```

			`Once that is built, run it with the following cmd`

			```bash
			`DEVICE_LIST="0,1,2,3,4,5,6,7"`
			`docker run -it \`
			`--privileged \`
			`--cap-add=SYS_RESOURCE \`
			`--cap-add=IPC_LOCK \`
			`-p 8000:8000 \`
			`-p 8001:8001 \`
			`--name lmcache-ascend-dev \`
			`-e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \`
			`-e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \`
			`-e ASCEND_TOTAL_MEMORY_GB=32 \`
			`-e VLLM_TARGET_DEVICE=npu \`
			`-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \`
			`-v /etc/localtime:/etc/localtime \`
			`-v /var/log/npu:/var/log/npu \`
			`-v /dev/davinci_manager:/dev/davinci_manager \`
			`-v /dev/devmm_svm:/dev/devmm_svm \`
			`-v /etc/ascend_install.info:/etc/ascend_install.info \`
			`-v /etc/hccn.conf:/etc/hccn.conf \`
			`lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \`
			`/bin/bash`
			```

			`### Manual Installation`

			Assuming your working directory is ```/workspace``` and vllm/vllm-ascend have already been installed.

			`1. Install LMCache Repo`

[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073) What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com> 2026-04-09 15:37:57 +08:00			```bash
			`NO_CUDA_EXT=1 pip install lmcache==0.3.12`
			```
[feat] add LMCacheAscendConnector (#6882) ### What this PR does / why we need it? LMCache-Ascend is LMCache's solution on the Ascend platform and one of the KVCache pooling solutions for Ascend. We hope to integrate LMCache-Ascend into the vLLM-Ascend community as one of the official KVCache pooling solutions for vLLM-Ascend. We added a new LMCacheAscendConnector in vLLM-Ascend and registered it. ### Does this PR introduce _any_ user-facing change? Users can specify the kvconnector using `--kv-transfer-config`, allowing them to freely choose which kvconnector to use, without any user-facing change. ### How was this patch tested? Test by specifying `--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'` - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: chloroethylene <jjysama@gmail.com> 2026-03-13 17:41:35 +08:00
			`2. Install LMCache-Ascend Repo`

[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073) What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com> 2026-04-09 15:37:57 +08:00			```bash
			`cd /workspace/LMCache-Ascend`
			`python3 -m pip install -v --no-build-isolation -e .`
			```
[feat] add LMCacheAscendConnector (#6882) ### What this PR does / why we need it? LMCache-Ascend is LMCache's solution on the Ascend platform and one of the KVCache pooling solutions for Ascend. We hope to integrate LMCache-Ascend into the vLLM-Ascend community as one of the official KVCache pooling solutions for vLLM-Ascend. We added a new LMCacheAscendConnector in vLLM-Ascend and registered it. ### Does this PR introduce _any_ user-facing change? Users can specify the kvconnector using `--kv-transfer-config`, allowing them to freely choose which kvconnector to use, without any user-facing change. ### How was this patch tested? Test by specifying `--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'` - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: chloroethylene <jjysama@gmail.com> 2026-03-13 17:41:35 +08:00
			`### Usage`

			`We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.`

			`#### Online serving`

			```bash
			`python \`
			`-m vllm.entrypoints.openai.api_server \`
			`--port 8100 \`
			`--model /data/models/Qwen/Qwen3-32B \`
			`--trust-remote-code \`
			`--disable-log-requests \`
			`--block-size 128 \`
			`--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'`
			```

			`#### Offline`

			```python
			`ktc = KVTransferConfig(`
			`kv_connector="LMCacheAscendConnector",`
			`kv_role="kv_both"`
			`)`
			```