[Doc] Update doc to work with release (#85)

1. Update CANN image name 2. Add pta install step 3. update vllm-ascend docker image name to ghcr 4. update quick_start to use vllm-ascend image directly. 5. fix `note` style Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-19 09:51:43 +08:00
parent 17de078d83
commit fafd70e91c
11 changed files with 119 additions and 132 deletions
--- a/docs/source/quick_start.md
+++ b/docs/source/quick_start.md
@@ -6,92 +6,33 @@
 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)

-<!-- TODO(yikun): replace "Prepare Environment" and "Installation" with "Running with vllm-ascend container image" -->
-
-### Prepare Environment
-
-You can use the container image directly with one line command:
-
-```bash
-# Update DEVICE according to your device (/dev/davinci[0-7])
-DEVICE=/dev/davinci7
-IMAGE=quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10
-docker run \
-    --name vllm-ascend-env --device $DEVICE \
-    --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc \
-    -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-    -v /etc/ascend_install.info:/etc/ascend_install.info \
-    -v /root/.cache:/root/.cache \
-    -it --rm $IMAGE bash
-```
-
-You can verify by running below commands in above container shell:
-
-```bash
-npu-smi info
-```
-
-You will see following message:
-
-```
-+-------------------------------------------------------------------------------------------+
-| npu-smi 23.0.2              Version: 23.0.2                                               |
-+----------------------+---------------+----------------------------------------------------+
-| NPU   Name           | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
-| Chip                 | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
-+======================+===============+====================================================+
-| 0     xxx            | OK            | 0.0         40                0    / 0             |
-| 0                    | 0000:C1:00.0  | 0           882  / 15169      0    / 32768         |
-+======================+===============+====================================================+
-```
-
-
-## Installation
-
-Prepare:
-
-```bash
-apt update
-apt install git curl vim -y
-# Config pypi mirror to speedup
-pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
-```
-
-Create your venv
-
-```bash
-python3 -m venv .venv
-source .venv/bin/activate
-pip install --upgrade pip
-```
-
-You can install vLLM and vllm-ascend plugin by using:
+## Setup environment using container

 ```{code-block} bash
   :substitutions:

-# Install vLLM (About 5 mins)
-git clone --depth 1 --branch |vllm_version| https://github.com/vllm-project/vllm.git
-cd vllm
-VLLM_TARGET_DEVICE=empty pip install .
-cd ..
+# You can change version a suitable one base on your requirement, e.g. main
+export IMAGE=ghcr.io/vllm-project/vllm-ascend:|vllm_newest_release_version|

-# Install vLLM Ascend Plugin:
-git clone --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-pip install -e .
-cd ..
+docker run \
+--name vllm-ascend \
+--device /dev/davinci0 \
+--device /dev/davinci_manager \
+--device /dev/devmm_svm \
+--device /dev/hisi_hdc \
+-v /usr/local/dcmi:/usr/local/dcmi \
+-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+-v /etc/ascend_install.info:/etc/ascend_install.info \
+-v /root/.cache:/root/.cache \
+-p 8000:8000 \
+-it $IMAGE bash
 ```

-
 ## Usage

-After vLLM and vLLM Ascend plugin installation, you can start to
-try [vLLM QuickStart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html).
-
-You have two ways to start vLLM on Ascend NPU:
+There are two ways to start vLLM on Ascend NPU:

 ### Offline Batched Inference with vLLM

@@ -99,7 +40,6 @@ With vLLM installed, you can start generating texts for list of input prompts (i

 ```bash
 # Use Modelscope mirror to speed up download
-pip install modelscope
 export VLLM_USE_MODELSCOPE=true
 ```

@@ -132,7 +72,6 @@ the following command to start the vLLM server with the

 ```bash
 # Use Modelscope mirror to speed up download
-pip install modelscope
 export VLLM_USE_MODELSCOPE=true
 # Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
 vllm serve Qwen/Qwen2.5-0.5B-Instruct &
@@ -178,7 +117,7 @@ kill -2 $VLLM_PID

 You will see output as below:
 ```
-INFO 02-12 03:34:10 launcher.py:59] Shutting down FastAPI HTTP server.
+INFO:     Shutting down FastAPI HTTP server.
 INFO:     Shutting down
 INFO:     Waiting for application shutdown.
 INFO:     Application shutdown complete.