What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2.5 KiB
2.5 KiB
LMCache-Ascend Deployment Guide
Overview
LMCache-Ascend is a community maintained plugin for running LMCache on the Ascend NPU.
We provide a simple deployment guide here. For further info about deployment notes, please refer to LMCache-Ascend doc
Getting Started
Clone LMCache-Ascend Repo
Our repo contains a kvcache ops submodule for ease of maintenance, therefore we recommend cloning the repo with submodules.
cd /workspace
git clone --recurse-submodules https://github.com/LMCache/LMCache-Ascend.git
Docker
cd /workspace/LMCache-Ascend
docker build -f docker/Dockerfile.a2.openEuler -t lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler .
Once that is built, run it with the following cmd
DEVICE_LIST="0,1,2,3,4,5,6,7"
docker run -it \
--privileged \
--cap-add=SYS_RESOURCE \
--cap-add=IPC_LOCK \
-p 8000:8000 \
-p 8001:8001 \
--name lmcache-ascend-dev \
-e ASCEND_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_RT_VISIBLE_DEVICES=${DEVICE_LIST} \
-e ASCEND_TOTAL_MEMORY_GB=32 \
-e VLLM_TARGET_DEVICE=npu \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/localtime:/etc/localtime \
-v /var/log/npu:/var/log/npu \
-v /dev/davinci_manager:/dev/davinci_manager \
-v /dev/devmm_svm:/dev/devmm_svm \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /etc/hccn.conf:/etc/hccn.conf \
lmcache-ascend:v0.3.12-vllm-ascend-v0.11.0-openeuler \
/bin/bash
Manual Installation
Assuming your working directory is /workspace and vllm/vllm-ascend have already been installed.
-
Install LMCache Repo
NO_CUDA_EXT=1 pip install lmcache==0.3.12 -
Install LMCache-Ascend Repo
cd /workspace/LMCache-Ascend python3 -m pip install -v --no-build-isolation -e .
Usage
We introduce a dynamic KVConnector via LMCacheAscendConnectorV1Dynamic, therefore LMCache-Ascend Connector can be used via the kv transfer config in the two following setting.
Online serving
python \
-m vllm.entrypoints.openai.api_server \
--port 8100 \
--model /data/models/Qwen/Qwen3-32B \
--trust-remote-code \
--disable-log-requests \
--block-size 128 \
--kv-transfer-config '{"kv_connector":"LMCacheAscendConnector","kv_role":"kv_both"}'
Offline
ktc = KVTransferConfig(
kv_connector="LMCacheAscendConnector",
kv_role="kv_both"
)