xc-llm-ascend/docs/source/installation.md

# Installation

This document describes how to install vllm-ascend manually.

## Requirements

- OS: Linux
- Python: 3.10 or higher
- A hardware with Ascend NPU. It's usually the Atlas 800 A2 series.
- Software:

    | Software     | Supported version | Note |
    | ------------ | ----------------- | ---- | 
    | CANN         | >= 8.0.0.beta1    | Required for vllm-ascend and torch-npu |
    | torch-npu    | >= 2.5.1rc1       | Required for vllm-ascend |
    | torch        | >= 2.5.1          | Required for torch-npu and vllm |

## Configure a new environment

Before installing the package,  you need to make sure firmware/driver and CANN is installed correctly.

### Install firmwares and drivers

To verify that the Ascend NPU firmware and driver were correctly installed, run `npu-smi` info

> Tips: Refer to [Ascend Environment Setup Guide](https://ascend.github.io/docs/sources/ascend/quick_install.html) for more details.

### Install CANN (optional)

The installation of CANN wouldn’t be necessary if you are using a CANN container image, you can skip this step.If you want to install vllm-ascend on a bare environment by hand, you need install CANN first.

```bash
# Create a virtual environment
python -m venv vllm-ascend-env
source vllm-ascend-env/bin/activate

# Install required python packages.
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple attrs numpy==1.24.0 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions

# Download and install the CANN package.
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-toolkit_8.0.0_linux-aarch64.run
sh Ascend-cann-toolkit_8.0.0_linux-aarch64.run --full
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run
sh Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run --full
```

Once it's done, you can read either **Set up using Python** or **Set up using Docker** section to install and use vllm-ascend.

## Set up using Python

> Notes: If you are installing vllm-ascend on an arch64 machine, The `-f https://download.pytorch.org/whl/torch/` command parameter in this section can be omitted. It's only used for find torch package on x86 machine.

Please make sure that CANN is installed. It can be done by **Configure a new environment** step. Or by using an CANN container directly:

```bash
# Setup a CANN container using docker
# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7

docker run --rm \
    --name vllm-ascend-env \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -it quay.io/ascend/cann:8.0.0.beta1-910b-ubuntu22.04-py3.10 bash
```

Then you can install vllm-ascend from **pre-built wheel** or **source code**.

### Install from Pre-built wheels (Not support yet)

1. Install vllm

    Since vllm on pypi is not compatible with cpu, we need to install vllm from source code.

    ```bash
    git clone --depth 1 --branch v0.7.1 https://github.com/vllm-project/vllm
    cd vllm
    VLLM_TARGET_DEVICE=empty pip install . -f https://download.pytorch.org/whl/torch/
    ```

2. Install vllm-ascend

    ```bash
    pip install vllm-ascend -f https://download.pytorch.org/whl/torch/
    ```

### Install from source code

1. Install vllm

    ```bash
    git clone https://github.com/vllm-project/vllm
    cd vllm
    VLLM_TARGET_DEVICE=empty pip install . -f https://download.pytorch.org/whl/torch/
    ```

2. Install vllm-ascend

    ```bash
    git clone https://github.com/vllm-project/vllm-ascend.git
    cd vllm-ascend
    pip install -e . -f https://download.pytorch.org/whl/torch/
    ```

## Set up using Docker

> Tips: CANN, torch, torch_npu, vllm and vllm_ascend are pre-installed in the Docker image already.

### Pre-built images (Not support yet)

Just pull the image and run it with bash.

```bash
docker pull quay.io/ascend/vllm-ascend:latest

# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7

docker run --rm \
    --name vllm-ascend-env \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -it quay.io/ascend/vllm-ascend:0.7.1rc1 bash
```

### Build image from source

If you want to build the docker image from main branch, you can do it by following steps:

```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend

docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .

# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7

docker run --rm \
    --name vllm-ascend-env \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -it vllm-ascend-dev-image:latest bash
```

## Extra information

### Verify installation

Create and run a simple inference test. The `example.py` can be like:

```python
from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="facebook/opt-125m")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

Then run:

```bash
# export VLLM_USE_MODELSCOPE=true to speed up download if huggingface is not reachable.
python example.py
```
-												[Docs] Add official doc index (#29)

Add official doc index. Move the release content to the right place.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-11 12:00:27 +08:00
+								# Installation
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								This document describes how to install vllm-ascend manually.
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								## Requirements
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								- OS: Linux
 								- Python: 3.10 or higher
 								- A hardware with Ascend NPU. It's usually the Atlas 800 A2 series.
 								- Software:
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								    | Software     | Supported version | Note |
 								    | ------------ | ----------------- | ---- |
 								    | CANN         | >= 8.0.0.beta1    | Required for vllm-ascend and torch-npu |
 								    | torch-npu    | >= 2.5.1rc1       | Required for vllm-ascend |
 								    | torch        | >= 2.5.1          | Required for torch-npu and vllm |
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								## Configure a new environment
 								Before installing the package,  you need to make sure firmware/driver and CANN is installed correctly.
 								### Install firmwares and drivers
 								To verify that the Ascend NPU firmware and driver were correctly installed, run `npu-smi` info
 								> Tips: Refer to [Ascend Environment Setup Guide](https://ascend.github.io/docs/sources/ascend/quick_install.html) for more details.
 								### Install CANN (optional)
 								The installation of CANN wouldn’t be necessary if you are using a CANN container image, you can skip this step.If you want to install vllm-ascend on a bare environment by hand, you need install CANN first.
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
 								```bash
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								# Create a virtual environment
 								python -m venv vllm-ascend-env
 								source vllm-ascend-env/bin/activate
 								# Install required python packages.
 								pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple attrs numpy==1.24.0 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions
 								# Download and install the CANN package.
 								wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-toolkit_8.0.0_linux-aarch64.run
 								sh Ascend-cann-toolkit_8.0.0_linux-aarch64.run --full
 								wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run
 								sh Ascend-cann-kernels-910b_8.0.0_linux-aarch64.run --full
 								```
 								Once it's done, you can read either **Set up using Python** or **Set up using Docker** section to install and use vllm-ascend.
 								## Set up using Python
 								> Notes: If you are installing vllm-ascend on an arch64 machine, The `-f https://download.pytorch.org/whl/torch/` command parameter in this section can be omitted. It's only used for find torch package on x86 machine.
 								Please make sure that CANN is installed. It can be done by **Configure a new environment** step. Or by using an CANN container directly:
 								```bash
 								# Setup a CANN container using docker
 								# Update DEVICE according to your device (/dev/davinci[0-7])
 								DEVICE=/dev/davinci7
 								docker run --rm \
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
+								    --name vllm-ascend-env \
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								    --device $DEVICE \
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
+								    --device /dev/davinci_manager \
 								    --device /dev/devmm_svm \
 								    --device /dev/hisi_hdc \
 								    -v /usr/local/dcmi:/usr/local/dcmi \
 								    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
 								    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
 								    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
 								    -v /etc/ascend_install.info:/etc/ascend_install.info \
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								    -it quay.io/ascend/cann:8.0.0.beta1-910b-ubuntu22.04-py3.10 bash
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
+								```
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								Then you can install vllm-ascend from **pre-built wheel** or **source code**.
 								### Install from Pre-built wheels (Not support yet)
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+. Install vllm
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								    Since vllm on pypi is not compatible with cpu, we need to install vllm from source code.
-												[Core] Init vllm-ascend (#3)

### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
											
										
										
											2025-02-05 10:53:12 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								    ```bash
 								    git clone --depth 1 --branch v0.7.1 https://github.com/vllm-project/vllm
 								    cd vllm
 								    VLLM_TARGET_DEVICE=empty pip install . -f https://download.pytorch.org/whl/torch/
 								    ```
-												Followup fix on official doc update (#34)

### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-02-11 14:28:26 +08:00
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+. Install vllm-ascend
 								    ```bash
 								    pip install vllm-ascend -f https://download.pytorch.org/whl/torch/
 								    ```
 								### Install from source code
 . Install vllm
 								    ```bash
 								    git clone https://github.com/vllm-project/vllm
 								    cd vllm
 								    VLLM_TARGET_DEVICE=empty pip install . -f https://download.pytorch.org/whl/torch/
 								    ```
 . Install vllm-ascend
 								    ```bash
 								    git clone https://github.com/vllm-project/vllm-ascend.git
 								    cd vllm-ascend
 								    pip install -e . -f https://download.pytorch.org/whl/torch/
 								    ```
 								## Set up using Docker
 								> Tips: CANN, torch, torch_npu, vllm and vllm_ascend are pre-installed in the Docker image already.
 								### Pre-built images (Not support yet)
 								Just pull the image and run it with bash.
-												Followup fix on official doc update (#34)

### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-02-11 14:28:26 +08:00
 								```bash
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								docker pull quay.io/ascend/vllm-ascend:latest
 								# Update DEVICE according to your device (/dev/davinci[0-7])
 								DEVICE=/dev/davinci7
 								docker run --rm \
 								    --name vllm-ascend-env \
 								    --device $DEVICE \
 								    --device /dev/davinci_manager \
 								    --device /dev/devmm_svm \
 								    --device /dev/hisi_hdc \
 								    -v /usr/local/dcmi:/usr/local/dcmi \
 								    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
 								    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
 								    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
 								    -v /etc/ascend_install.info:/etc/ascend_install.info \
 								    -it quay.io/ascend/vllm-ascend:0.7.1rc1 bash
-												Followup fix on official doc update (#34)

### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-02-11 14:28:26 +08:00
+								```
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
+								### Build image from source
 								If you want to build the docker image from main branch, you can do it by following steps:
-												Followup fix on official doc update (#34)

### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-02-11 14:28:26 +08:00
+								```bash
 								git clone https://github.com/vllm-project/vllm-ascend.git
 								cd vllm-ascend
-												[Doc] Add install doc (#49)

Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
											
										
										
											2025-02-14 10:22:15 +08:00
 								docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .
 								# Update DEVICE according to your device (/dev/davinci[0-7])
 								DEVICE=/dev/davinci7
 								docker run --rm \
 								    --name vllm-ascend-env \
 								    --device $DEVICE \
 								    --device /dev/davinci_manager \
 								    --device /dev/devmm_svm \
 								    --device /dev/hisi_hdc \
 								    -v /usr/local/dcmi:/usr/local/dcmi \
 								    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
 								    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
 								    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
 								    -v /etc/ascend_install.info:/etc/ascend_install.info \
 								    -it vllm-ascend-dev-image:latest bash
 								```
 								## Extra information
 								### Verify installation
 								Create and run a simple inference test. The `example.py` can be like:
 								```python
 								from vllm import LLM, SamplingParams
 								prompts = [
 								    "Hello, my name is",
 								    "The president of the United States is",
 								    "The capital of France is",
 								    "The future of AI is",
 								]
 								# Create a sampling params object.
 								sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
 								# Create an LLM.
 								llm = LLM(model="facebook/opt-125m")
 								# Generate texts from the prompts.
 								outputs = llm.generate(prompts, sampling_params)
 								for output in outputs:
 								    prompt = output.prompt
 								    generated_text = output.outputs[0].text
 								    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 								```
 								Then run:
 								```bash
 								# export VLLM_USE_MODELSCOPE=true to speed up download if huggingface is not reachable.
 								python example.py
-												Followup fix on official doc update (#34)

### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
											
										
										
											2025-02-11 14:28:26 +08:00
+								```