2026-03-13 17:41:35 +08:00
# LMCache-Ascend Deployment Guide
## Overview
LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.
We provide a simple deployment guide here. For further info about deployment notes, please refer to [LMCache-Ascend doc ](https://github.com/LMCache/LMCache-Ascend/blob/main/README.md )
## Getting Started
### Clone LMCache-Ascend Repo
Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.
```bash
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
```
### Docker
```bash
cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .
```
Once that is built, run it with the following cmd
```bash
DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
--privileged \
--cap-add=SYS_RESOURCE \
--cap-add=IPC_LOCK \
-p 8000:8000 \
-p 8001:8001 \
--name lmcache-ascend-dev \
-e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_TOTAL_MEMORY_GB=32 \
-e VLLM_TARGET_DEVICE=npu \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/localtime:/etc/localtime \
-v /var/log/npu:/var/log/npu \
-v /dev/davinci_manager:/dev/davinci_manager \
-v /dev/devmm_svm:/dev/devmm_svm \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /etc/hccn.conf:/etc/hccn.conf \
lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \
/bin/bash
```
### Manual Installation
Assuming your working directory is ```/workspace` `` and vllm/vllm-ascend have already been installed.
1. Install LMCache Repo
[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)
What this PR does / why we need it?
This pull request performs a comprehensive cleanup of the vLLM Ascend
documentation. It fixes numerous typos, grammatical errors, and phrasing
issues across community guidelines, developer documents, hardware
tutorials, and feature guides. Key improvements include correcting
hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code
examples (removing duplicate flags and trailing commas), and improving
the clarity of technical explanations. These changes are necessary to
ensure the documentation is professional, accurate, and easy for users
to follow.
Does this PR introduce any user-facing change?
No, this PR contains documentation-only updates.
How was this patch tested?
The changes were manually reviewed for accuracy and grammatical
correctness. No functional code changes were introduced.
---------
Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-09 15:37:57 +08:00
```bash
NO_CUDA_EXT=1 pip install lmcache==0.3.12
```
2026-03-13 17:41:35 +08:00
2. Install LMCache-Ascend Repo
[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)
What this PR does / why we need it?
This pull request performs a comprehensive cleanup of the vLLM Ascend
documentation. It fixes numerous typos, grammatical errors, and phrasing
issues across community guidelines, developer documents, hardware
tutorials, and feature guides. Key improvements include correcting
hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code
examples (removing duplicate flags and trailing commas), and improving
the clarity of technical explanations. These changes are necessary to
ensure the documentation is professional, accurate, and easy for users
to follow.
Does this PR introduce any user-facing change?
No, this PR contains documentation-only updates.
How was this patch tested?
The changes were manually reviewed for accuracy and grammatical
correctness. No functional code changes were introduced.
---------
Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-09 15:37:57 +08:00
```bash
cd /workspace/LMCache-Ascend
python3 -m pip install -v --no-build-isolation -e .
```
2026-03-13 17:41:35 +08:00
### Usage
We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.
#### Online serving
```bash
python \
-m vllm.entrypoints.openai.api_server \
--port 8100 \
--model /data/models/Qwen/Qwen3-32B \
--trust-remote-code \
--disable-log-requests \
--block-size 128 \
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
```
#### Offline
```python
ktc = KVTransferConfig(
kv_connector="LMCacheAscendConnector",
kv_role="kv_both"
)
```