[Docs] Add official doc index (#29)
Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
85
README.md
85
README.md
@@ -31,20 +31,11 @@ This plugin is the recommended approach for supporting the Ascend backend within
|
||||
By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
|
||||
|
||||
## Prerequisites
|
||||
### Support Devices
|
||||
- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
|
||||
- Atlas 800I A2 Inference series (Atlas 800I A2)
|
||||
|
||||
### Dependencies
|
||||
| Requirement | Supported version | Recommended version | Note |
|
||||
|-------------|-------------------| ----------- |------------------------------------------|
|
||||
| vLLM | main | main | Required for vllm-ascend |
|
||||
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm |
|
||||
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu |
|
||||
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend |
|
||||
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm |
|
||||
- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series
|
||||
- Software: vLLM (the same version as vllm-ascned), Python >= 3.9, CANN >= 8.0.RC2, PyTorch >= 2.4.0, torch-npu >= 2.4.0
|
||||
|
||||
Find more about how to setup your environment in [here](docs/environment.md).
|
||||
Find more about how to setup your environment step by step in [here](docs/installation.md).
|
||||
|
||||
## Getting Started
|
||||
|
||||
@@ -73,78 +64,14 @@ Run the following command to start the vLLM server with the [Qwen/Qwen2.5-0.5B-I
|
||||
vllm serve Qwen/Qwen2.5-0.5B-Instruct
|
||||
curl http://localhost:8000/v1/models
|
||||
```
|
||||
|
||||
Please refer to [vLLM Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for more details.
|
||||
|
||||
## Building
|
||||
|
||||
#### Build Python package from source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
||||
cd vllm-ascend
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
#### Build container image from source
|
||||
```bash
|
||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
||||
cd vllm-ascend
|
||||
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
|
||||
```
|
||||
|
||||
See [Building and Testing](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.
|
||||
|
||||
## Feature Support Matrix
|
||||
| Feature | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
|
||||
| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 |
|
||||
| LoRA | ✗ | Plan in 2025 Q1 |
|
||||
| Prompt adapter | ✅ ||
|
||||
| Speculative decoding | ✅ | Impore accuracy in 2025 Q1|
|
||||
| Pooling | ✗ | Plan in 2025 Q1 |
|
||||
| Enc-dec | ✗ | Plan in 2025 Q1 |
|
||||
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
|
||||
| LogProbs | ✅ ||
|
||||
| Prompt logProbs | ✅ ||
|
||||
| Async output | ✅ ||
|
||||
| Multi step scheduler | ✅ ||
|
||||
| Best of | ✅ ||
|
||||
| Beam search | ✅ ||
|
||||
| Guided Decoding | ✗ | Plan in 2025 Q1 |
|
||||
|
||||
## Model Support Matrix
|
||||
|
||||
The list here is a subset of the supported models. See [supported_models](docs/supported_models.md) for more details:
|
||||
| Model | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Qwen 2.5 | ✅ ||
|
||||
| Mistral | | Need test |
|
||||
| DeepSeek v2.5 | |Need test |
|
||||
| LLama3.1/3.2 | ✅ ||
|
||||
| Gemma-2 | |Need test|
|
||||
| baichuan | |Need test|
|
||||
| minicpm | |Need test|
|
||||
| internlm | ✅ ||
|
||||
| ChatGLM | ✅ ||
|
||||
| InternVL 2.5 | ✅ ||
|
||||
| Qwen2-VL | ✅ ||
|
||||
| GLM-4v | |Need test|
|
||||
| Molomo | ✅ ||
|
||||
| LLaVA 1.5 | ✅ ||
|
||||
| Mllama | |Need test|
|
||||
| LLaVA-Next | |Need test|
|
||||
| LLaVA-Next-Video | |Need test|
|
||||
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
|
||||
| Ultravox | |Need test|
|
||||
| Qwen2-Audio | ✅ ||
|
||||
**Please refer to [Official Docs](./docs/index.md) for more details.**
|
||||
|
||||
## Contributing
|
||||
See [CONTRIBUTING](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.
|
||||
|
||||
We welcome and value any contributions and collaborations:
|
||||
- Please feel free comments [here](https://github.com/vllm-project/vllm-ascend/issues/19) about your usage of vLLM Ascend Plugin.
|
||||
- Please let us know if you encounter a bug by [filing an issue](https://github.com/vllm-project/vllm-ascend/issues).
|
||||
- Please see the guidance on how to contribute in [CONTRIBUTING.md](./CONTRIBUTING.md).
|
||||
|
||||
## License
|
||||
|
||||
|
||||
87
README.zh.md
87
README.zh.md
@@ -30,21 +30,12 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的
|
||||
|
||||
使用 vLLM 昇腾插件,可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。
|
||||
|
||||
## 前提
|
||||
### 支持的设备
|
||||
- Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
|
||||
- Atlas 800I A2 推理系列 (Atlas 800I A2)
|
||||
## 准备
|
||||
|
||||
### 依赖
|
||||
| 需求 | 支持的版本 | 推荐版本 | 注意 |
|
||||
|-------------|-------------------| ----------- |------------------------------------------|
|
||||
| vLLM | main | main | vllm-ascend 依赖 |
|
||||
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 |
|
||||
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 |
|
||||
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 |
|
||||
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 |
|
||||
- 硬件:Atlas 800I A2 Inference系列、Atlas A2 Training系列
|
||||
- 软件:vLLM(与vllm-ascned版本相同),Python >= 3.9,CANN >= 8.0.RC2,PyTorch >= 2.4.0,torch-npu >= 2.4.0
|
||||
|
||||
在[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。
|
||||
在[此处](docs/installation.md) 中查找有关如何逐步设置环境的更多信息。
|
||||
|
||||
## 开始使用
|
||||
|
||||
@@ -74,78 +65,14 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct
|
||||
curl http://localhost:8000/v1/models
|
||||
```
|
||||
|
||||
请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。
|
||||
|
||||
## 构建
|
||||
|
||||
#### 从源码构建Python包
|
||||
|
||||
```bash
|
||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
||||
cd vllm-ascend
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
#### 构建容器镜像
|
||||
```bash
|
||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
||||
cd vllm-ascend
|
||||
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
|
||||
```
|
||||
|
||||
查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息,其中包含逐步指南,帮助您设置开发环境、构建和测试。
|
||||
|
||||
## 特性支持矩阵
|
||||
| Feature | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
|
||||
| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 |
|
||||
| LoRA | ✗ | Plan in 2025 Q1 |
|
||||
| Prompt adapter | ✅ ||
|
||||
| Speculative decoding | ✅ | Impore accuracy in 2025 Q1|
|
||||
| Pooling | ✗ | Plan in 2025 Q1 |
|
||||
| Enc-dec | ✗ | Plan in 2025 Q1 |
|
||||
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
|
||||
| LogProbs | ✅ ||
|
||||
| Prompt logProbs | ✅ ||
|
||||
| Async output | ✅ ||
|
||||
| Multi step scheduler | ✅ ||
|
||||
| Best of | ✅ ||
|
||||
| Beam search | ✅ ||
|
||||
| Guided Decoding | ✗ | Plan in 2025 Q1 |
|
||||
|
||||
## 模型支持矩阵
|
||||
|
||||
此处展示了部分受支持的模型。有关更多详细信息,请参阅 [supported_models](docs/supported_models.md):
|
||||
| Model | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Qwen 2.5 | ✅ ||
|
||||
| Mistral | | Need test |
|
||||
| DeepSeek v2.5 | |Need test |
|
||||
| LLama3.1/3.2 | ✅ ||
|
||||
| Gemma-2 | |Need test|
|
||||
| baichuan | |Need test|
|
||||
| minicpm | |Need test|
|
||||
| internlm | ✅ ||
|
||||
| ChatGLM | ✅ ||
|
||||
| InternVL 2.5 | ✅ ||
|
||||
| Qwen2-VL | ✅ ||
|
||||
| GLM-4v | |Need test|
|
||||
| Molomo | ✅ ||
|
||||
| LLaVA 1.5 | ✅ ||
|
||||
| Mllama | |Need test|
|
||||
| LLaVA-Next | |Need test|
|
||||
| LLaVA-Next-Video | |Need test|
|
||||
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
|
||||
| Ultravox | |Need test|
|
||||
| Qwen2-Audio | ✅ ||
|
||||
|
||||
**请参阅 [官方文档](./docs/index.md)以获取更多详细信息**
|
||||
|
||||
## 贡献
|
||||
有关更多详细信息,请参阅 [CONTRIBUTING](./CONTRIBUTING.md),可以更详细的帮助您部署开发环境、构建和测试。
|
||||
|
||||
我们欢迎并重视任何形式的贡献与合作:
|
||||
- 您可以在[这里](https://github.com/vllm-project/vllm-ascend/issues/19)反馈您的使用体验。
|
||||
- 请通过[提交问题](https://github.com/vllm-project/vllm-ascend/issues)来告知我们您遇到的任何错误。
|
||||
- 请参阅 [CONTRIBUTING.zh.md](./CONTRIBUTING.zh.md) 中的贡献指南。
|
||||
|
||||
## 许可证
|
||||
|
||||
|
||||
@@ -1,38 +0,0 @@
|
||||
### 昇腾NPU环境准备
|
||||
|
||||
### 依赖
|
||||
| 需求 | 支持的版本 | 推荐版本 | 注意 |
|
||||
|-------------|-------------------| ----------- |------------------------------------------|
|
||||
| vLLM | main | main | vllm-ascend 依赖 |
|
||||
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | vllm 依赖 |
|
||||
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | vllm-ascend and torch-npu 依赖 |
|
||||
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | vllm-ascend 依赖 |
|
||||
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | torch-npu and vllm 依赖 |
|
||||
|
||||
|
||||
以下为安装推荐版本软件的简短说明:
|
||||
|
||||
#### 容器化安装
|
||||
|
||||
您可以直接使用[容器镜像](https://hub.docker.com/r/ascendai/cann),只需一行命令即可:
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
--name vllm-ascend-env \
|
||||
--device /dev/davinci1 \
|
||||
--device /dev/davinci_manager \
|
||||
--device /dev/devmm_svm \
|
||||
--device /dev/hisi_hdc \
|
||||
-v /usr/local/dcmi:/usr/local/dcmi \
|
||||
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
||||
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
||||
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
||||
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
||||
-it quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10 bash
|
||||
```
|
||||
|
||||
您无需手动安装 `torch` 和 `torch_npu` ,它们将作为 `vllm-ascend` 依赖项自动安装。
|
||||
|
||||
#### 手动安装
|
||||
|
||||
您也可以选择手动安装,按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。
|
||||
15
docs/index.md
Normal file
15
docs/index.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Ascend plugin for vLLM
|
||||
vLLM Ascend plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU.
|
||||
|
||||
This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162), providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM.
|
||||
|
||||
By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
|
||||
|
||||
## Contents
|
||||
|
||||
- [Quick Start](./quick_start.md)
|
||||
- [Installation](./installation.md)
|
||||
- Usage
|
||||
- [Running vLLM with Ascend](./usage/running_vllm_with_ascend.md)
|
||||
- [Feature Support](./usage/feature_support.md)
|
||||
- [Supported Models](./usage/supported_models.md)
|
||||
@@ -1,3 +1,23 @@
|
||||
# Installation
|
||||
|
||||
|
||||
## Building
|
||||
|
||||
#### Build Python package from source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
||||
cd vllm-ascend
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
#### Build container image from source
|
||||
```bash
|
||||
git clone https://github.com/vllm-project/vllm-ascend.git
|
||||
cd vllm-ascend
|
||||
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
|
||||
```
|
||||
|
||||
### Prepare Ascend NPU environment
|
||||
|
||||
### Dependencies
|
||||
17
docs/quick_start.md
Normal file
17
docs/quick_start.md
Normal file
@@ -0,0 +1,17 @@
|
||||
# Quick Start
|
||||
|
||||
## Prerequisites
|
||||
### Support Devices
|
||||
- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
|
||||
- Atlas 800I A2 Inference series (Atlas 800I A2)
|
||||
|
||||
### Dependencies
|
||||
| Requirement | Supported version | Recommended version | Note |
|
||||
|-------------|-------------------| ----------- |------------------------------------------|
|
||||
| vLLM | main | main | Required for vllm-ascend |
|
||||
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm |
|
||||
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu |
|
||||
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend |
|
||||
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm |
|
||||
|
||||
Find more about how to setup your environment in [here](docs/environment.md).
|
||||
@@ -1 +0,0 @@
|
||||
TBD
|
||||
19
docs/usage/feature_support.md
Normal file
19
docs/usage/feature_support.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Feature Support
|
||||
|
||||
| Feature | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
|
||||
| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q1 |
|
||||
| LoRA | ✗ | Plan in 2025 Q1 |
|
||||
| Prompt adapter | ✅ ||
|
||||
| Speculative decoding | ✅ | Improve accuracy in 2025 Q1|
|
||||
| Pooling | ✗ | Plan in 2025 Q1 |
|
||||
| Enc-dec | ✗ | Plan in 2025 Q1 |
|
||||
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
|
||||
| LogProbs | ✅ ||
|
||||
| Prompt logProbs | ✅ ||
|
||||
| Async output | ✅ ||
|
||||
| Multi step scheduler | ✅ ||
|
||||
| Best of | ✅ ||
|
||||
| Beam search | ✅ ||
|
||||
| Guided Decoding | ✗ | Plan in 2025 Q1 |
|
||||
1
docs/usage/running_vllm_with_ascend.md
Normal file
1
docs/usage/running_vllm_with_ascend.md
Normal file
@@ -0,0 +1 @@
|
||||
# Running vLLM with Ascend
|
||||
24
docs/usage/supported_models.md
Normal file
24
docs/usage/supported_models.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# Supported Models
|
||||
|
||||
| Model | Supported | Note |
|
||||
|---------|-----------|------|
|
||||
| Qwen 2.5 | ✅ ||
|
||||
| Mistral | | Need test |
|
||||
| DeepSeek v2.5 | |Need test |
|
||||
| LLama3.1/3.2 | ✅ ||
|
||||
| Gemma-2 | |Need test|
|
||||
| baichuan | |Need test|
|
||||
| minicpm | |Need test|
|
||||
| internlm | ✅ ||
|
||||
| ChatGLM | ✅ ||
|
||||
| InternVL 2.5 | ✅ ||
|
||||
| Qwen2-VL | ✅ ||
|
||||
| GLM-4v | |Need test|
|
||||
| Molomo | ✅ ||
|
||||
| LLaVA 1.5 | ✅ ||
|
||||
| Mllama | |Need test|
|
||||
| LLaVA-Next | |Need test|
|
||||
| LLaVA-Next-Video | |Need test|
|
||||
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
|
||||
| Ultravox | |Need test|
|
||||
| Qwen2-Audio | ✅ ||
|
||||
Reference in New Issue
Block a user