[Doc] Update doc to work with release (#85)

1. Update CANN image name
2. Add pta install step
3. update vllm-ascend docker image name to ghcr
4. update quick_start to use vllm-ascend image directly.
5. fix `note` style

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2025-02-19 09:51:43 +08:00
committed by GitHub
parent 17de078d83
commit fafd70e91c
11 changed files with 119 additions and 132 deletions

View File

@@ -6,92 +6,33 @@
- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 Inference series (Atlas 800I A2)
<!-- TODO(yikun): replace "Prepare Environment" and "Installation" with "Running with vllm-ascend container image" -->
### Prepare Environment
You can use the container image directly with one line command:
```bash
# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7
IMAGE=quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10
docker run \
--name vllm-ascend-env --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-it --rm $IMAGE bash
```
You can verify by running below commands in above container shell:
```bash
npu-smi info
```
You will see following message:
```
+-------------------------------------------------------------------------------------------+
| npu-smi 23.0.2 Version: 23.0.2 |
+----------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+======================+===============+====================================================+
| 0 xxx | OK | 0.0 40 0 / 0 |
| 0 | 0000:C1:00.0 | 0 882 / 15169 0 / 32768 |
+======================+===============+====================================================+
```
## Installation
Prepare:
```bash
apt update
apt install git curl vim -y
# Config pypi mirror to speedup
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
```
Create your venv
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
```
You can install vLLM and vllm-ascend plugin by using:
## Setup environment using container
```{code-block} bash
:substitutions:
# Install vLLM (About 5 mins)
git clone --depth 1 --branch |vllm_version| https://github.com/vllm-project/vllm.git
cd vllm
VLLM_TARGET_DEVICE=empty pip install .
cd ..
# You can change version a suitable one base on your requirement, e.g. main
export IMAGE=ghcr.io/vllm-project/vllm-ascend:|vllm_newest_release_version|
# Install vLLM Ascend Plugin:
git clone --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .
cd ..
docker run \
--name vllm-ascend \
--device /dev/davinci0 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-it $IMAGE bash
```
## Usage
After vLLM and vLLM Ascend plugin installation, you can start to
try [vLLM QuickStart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html).
You have two ways to start vLLM on Ascend NPU:
There are two ways to start vLLM on Ascend NPU:
### Offline Batched Inference with vLLM
@@ -99,7 +40,6 @@ With vLLM installed, you can start generating texts for list of input prompts (i
```bash
# Use Modelscope mirror to speed up download
pip install modelscope
export VLLM_USE_MODELSCOPE=true
```
@@ -132,7 +72,6 @@ the following command to start the vLLM server with the
```bash
# Use Modelscope mirror to speed up download
pip install modelscope
export VLLM_USE_MODELSCOPE=true
# Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
@@ -178,7 +117,7 @@ kill -2 $VLLM_PID
You will see output as below:
```
INFO 02-12 03:34:10 launcher.py:59] Shutting down FastAPI HTTP server.
INFO: Shutting down FastAPI HTTP server.
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.