[Docs] Add official doc index (#29)

Add official doc index. Move the release content to the right place. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-11 12:00:27 +08:00
parent 7006835977
commit 51eadc68b9
10 changed files with 109 additions and 198 deletions
--- a/README.md
+++ b/README.md
@@ -31,20 +31,11 @@ This plugin is the recommended approach for supporting the Ascend backend within
 By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
 ## Prerequisites
 ### Support Devices
 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)
-### Dependencies
+- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series
-| Requirement | Supported version | Recommended version | Note                                     |
+- Software: vLLM (the same version as vllm-ascned), Python >= 3.9, CANN >= 8.0.RC2, PyTorch >= 2.4.0, torch-npu >= 2.4.0
 |-------------|-------------------| ----------- |------------------------------------------|
 | vLLM        | main              | main | Required for vllm-ascend                 |
 | Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) | Required for vllm                        |
 | CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu   |
 | torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | Required for vllm-ascend                 |
 | torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      | Required for torch-npu and vllm |
-Find more about how to setup your environment in [here](docs/environment.md).
+Find more about how to setup your environment step by step in [here](docs/installation.md).
 ## Getting Started
@@ -73,78 +64,14 @@ Run the following command to start the vLLM server with the [Qwen/Qwen2.5-0.5B-I
 vllm serve Qwen/Qwen2.5-0.5B-Instruct
 curl http://localhost:8000/v1/models
 ```
-
+**Please refer to [Official Docs](./docs/index.md) for more details.**
 Please refer to [vLLM Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for more details.
 ## Building
 #### Build Python package from source
 ```bash
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 pip install -e .
 ```
 #### Build container image from source
 ```bash
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 docker build -t vllm-ascend-dev-image -f ./Dockerfile .
 ```
 See [Building and Testing](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.
 ## Feature Support Matrix
 | Feature | Supported | Note |
 |---------|-----------|------|
 | Chunked Prefill | ✗ | Plan in 2025 Q1 |
 | Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 |
 | LoRA | ✗ | Plan in 2025 Q1 |
 | Prompt adapter | ✅ ||
 | Speculative decoding | ✅ | Impore accuracy in 2025 Q1|
 | Pooling | ✗ | Plan in 2025 Q1 |
 | Enc-dec | ✗ | Plan in 2025 Q1 |
 | Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
 | LogProbs | ✅ ||
 | Prompt logProbs | ✅ ||
 | Async output | ✅ ||
 | Multi step scheduler | ✅ ||
 | Best of | ✅ ||
 | Beam search | ✅ ||
 | Guided Decoding | ✗ | Plan in 2025 Q1 |
 ## Model Support Matrix
 The list here is a subset of the supported models. See [supported_models](docs/supported_models.md) for more details:
 | Model | Supported | Note |
 |---------|-----------|------|
 | Qwen 2.5 | ✅ ||
 | Mistral |  | Need test |
 | DeepSeek v2.5 | |Need test |
 | LLama3.1/3.2 | ✅ ||
 | Gemma-2 |  |Need test|
 | baichuan |  |Need test|
 | minicpm |  |Need test|
 | internlm | ✅ ||
 | ChatGLM | ✅ ||
 | InternVL 2.5 | ✅ ||
 | Qwen2-VL | ✅ ||
 | GLM-4v |  |Need test|
 | Molomo | ✅ ||
 | LLaVA 1.5 | ✅ ||
 | Mllama |  |Need test|
 | LLaVA-Next |  |Need test|
 | LLaVA-Next-Video |  |Need test|
 | Phi-3-Vison/Phi-3.5-Vison |  |Need test|
 | Ultravox |  |Need test|
 | Qwen2-Audio | ✅ ||
 ## Contributing
 See [CONTRIBUTING](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.
 We welcome and value any contributions and collaborations:
 - Please feel free comments [here](https://github.com/vllm-project/vllm-ascend/issues/19) about your usage of vLLM Ascend Plugin.
 - Please let us know if you encounter a bug by [filing an issue](https://github.com/vllm-project/vllm-ascend/issues).
 - Please see the guidance on how to contribute in [CONTRIBUTING.md](./CONTRIBUTING.md).
 ## License
--- a/README.zh.md
+++ b/README.zh.md
@@ -30,21 +30,12 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的
 使用 vLLM 昇腾插件，可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。
-## 前提
+## 准备
 ### 支持的设备
 - Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 推理系列 (Atlas 800I A2)
-### 依赖
+- 硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列
-| 需求 | 支持的版本 | 推荐版本 | 注意                                     |
+- 软件：vLLM（与vllm-ascned版本相同），Python >= 3.9，CANN >= 8.0.RC2，PyTorch >= 2.4.0，torch-npu >= 2.4.0
 |-------------|-------------------| ----------- |------------------------------------------|
 | vLLM        | main              | main |  vllm-ascend 依赖                 |
 | Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) |  vllm 依赖                       |
 | CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) |  vllm-ascend and torch-npu 依赖  |
 | torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | vllm-ascend 依赖                |
 | torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      |  torch-npu and vllm 依赖 |
-在[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。
+在[此处](docs/installation.md) 中查找有关如何逐步设置环境的更多信息。
 ## 开始使用
@@ -74,78 +65,14 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct
 curl http://localhost:8000/v1/models
 ```
-请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。
+**请参阅 [官方文档](./docs/index.md)以获取更多详细信息**
 ## 构建
 #### 从源码构建Python包
 ```bash
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 pip install -e .
 ```
 #### 构建容器镜像
 ```bash
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 docker build -t vllm-ascend-dev-image -f ./Dockerfile .
 ```
 查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息，其中包含逐步指南，帮助您设置开发环境、构建和测试。
 ## 特性支持矩阵
 | Feature | Supported | Note |
 |---------|-----------|------|
 | Chunked Prefill | ✗ | Plan in 2025 Q1 |
 | Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 |
 | LoRA | ✗ | Plan in 2025 Q1 |
 | Prompt adapter | ✅ ||
 | Speculative decoding | ✅ | Impore accuracy in 2025 Q1|
 | Pooling | ✗ | Plan in 2025 Q1 |
 | Enc-dec | ✗ | Plan in 2025 Q1 |
 | Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
 | LogProbs | ✅ ||
 | Prompt logProbs | ✅ ||
 | Async output | ✅ ||
 | Multi step scheduler | ✅ ||
 | Best of | ✅ ||
 | Beam search | ✅ ||
 | Guided Decoding | ✗ | Plan in 2025 Q1 |
 ## 模型支持矩阵
 此处展示了部分受支持的模型。有关更多详细信息，请参阅 [supported_models](docs/supported_models.md)：
 | Model | Supported | Note |
 |---------|-----------|------|
 | Qwen 2.5 | ✅ ||
 | Mistral |  | Need test |
 | DeepSeek v2.5 | |Need test |
 | LLama3.1/3.2 | ✅ ||
 | Gemma-2 |  |Need test|
 | baichuan |  |Need test|
 | minicpm |  |Need test|
 | internlm | ✅ ||
 | ChatGLM | ✅ ||
 | InternVL 2.5 | ✅ ||
 | Qwen2-VL | ✅ ||
 | GLM-4v |  |Need test|
 | Molomo | ✅ ||
 | LLaVA 1.5 | ✅ ||
 | Mllama |  |Need test|
 | LLaVA-Next |  |Need test|
 | LLaVA-Next-Video |  |Need test|
 | Phi-3-Vison/Phi-3.5-Vison |  |Need test|
 | Ultravox |  |Need test|
 | Qwen2-Audio | ✅ ||
 ## 贡献
 有关更多详细信息，请参阅 [CONTRIBUTING](./CONTRIBUTING.md)，可以更详细的帮助您部署开发环境、构建和测试。
 我们欢迎并重视任何形式的贡献与合作：
 - 您可以在[这里](https://github.com/vllm-project/vllm-ascend/issues/19)反馈您的使用体验。
 - 请通过[提交问题](https://github.com/vllm-project/vllm-ascend/issues)来告知我们您遇到的任何错误。
 - 请参阅 [CONTRIBUTING.zh.md](./CONTRIBUTING.zh.md) 中的贡献指南。
 ## 许可证
--- a/docs/environment.zh.md
+++ b/docs/environment.zh.md
@@ -1,38 +0,0 @@
 ### 昇腾NPU环境准备
 ### 依赖
 | 需求 | 支持的版本 | 推荐版本 | 注意                                     |
 |-------------|-------------------| ----------- |------------------------------------------|
 | vLLM        | main              | main |  vllm-ascend 依赖                 |
 | Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) |  vllm 依赖                       |
 | CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) |  vllm-ascend and torch-npu 依赖  |
 | torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | vllm-ascend 依赖                |
 | torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      |  torch-npu and vllm 依赖 |
 以下为安装推荐版本软件的简短说明：
 #### 容器化安装
 您可以直接使用[容器镜像](https://hub.docker.com/r/ascendai/cann)，只需一行命令即可：
 ```bash
 docker run \
    --name vllm-ascend-env \
    --device /dev/davinci1 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -it quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10 bash
 ```
 您无需手动安装 `torch` 和 `torch_npu` ，它们将作为 `vllm-ascend` 依赖项自动安装。
 #### 手动安装
 您也可以选择手动安装，按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。
--- a/docs/index.md
+++ b/docs/index.md
@@ -0,0 +1,15 @@
 # Ascend plugin for vLLM
 vLLM Ascend plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU.
 This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162), providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM.
 By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
 ## Contents
 - [Quick Start](./quick_start.md)
 - [Installation](./installation.md)
 - Usage
  - [Running vLLM with Ascend](./usage/running_vllm_with_ascend.md)
  - [Feature Support](./usage/feature_support.md)
  - [Supported Models](./usage/supported_models.md)
--- a/docs/installation.md
+++ b/docs/installation.md
@@ -1,3 +1,23 @@
 # Installation
 ## Building
 #### Build Python package from source
 ```bash
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 pip install -e .
 ```
 #### Build container image from source
 ```bash
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 docker build -t vllm-ascend-dev-image -f ./Dockerfile .
 ```
 ### Prepare Ascend NPU environment
 ### Dependencies
--- a/docs/quick_start.md
+++ b/docs/quick_start.md
@@ -0,0 +1,17 @@
 # Quick Start
 ## Prerequisites
 ### Support Devices
 - Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
 - Atlas 800I A2 Inference series (Atlas 800I A2)
 ### Dependencies
 | Requirement | Supported version | Recommended version | Note                                     |
 |-------------|-------------------| ----------- |------------------------------------------|
 | vLLM        | main              | main | Required for vllm-ascend                 |
 | Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) | Required for vllm                        |
 | CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu   |
 | torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | Required for vllm-ascend                 |
 | torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      | Required for torch-npu and vllm |
 Find more about how to setup your environment in [here](docs/environment.md).
--- a/docs/supported_models.md
+++ b/docs/supported_models.md
@@ -1 +0,0 @@
 TBD
--- a/docs/usage/feature_support.md
+++ b/docs/usage/feature_support.md
@@ -0,0 +1,19 @@
 # Feature Support
 | Feature | Supported | Note |
 |---------|-----------|------|
 | Chunked Prefill | ✗ | Plan in 2025 Q1 |
 | Automatic Prefix Caching | ✅ | Improve performance in 2025 Q1 |
 | LoRA | ✗ | Plan in 2025 Q1 |
 | Prompt adapter | ✅ ||
 | Speculative decoding | ✅ | Improve accuracy in 2025 Q1|
 | Pooling | ✗ | Plan in 2025 Q1 |
 | Enc-dec | ✗ | Plan in 2025 Q1 |
 | Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
 | LogProbs | ✅ ||
 | Prompt logProbs | ✅ ||
 | Async output | ✅ ||
 | Multi step scheduler | ✅ ||
 | Best of | ✅ ||
 | Beam search | ✅ ||
 | Guided Decoding | ✗ | Plan in 2025 Q1 |
--- a/docs/usage/running_vllm_with_ascend.md
+++ b/docs/usage/running_vllm_with_ascend.md
@@ -0,0 +1 @@
 # Running vLLM with Ascend
--- a/docs/usage/supported_models.md
+++ b/docs/usage/supported_models.md
@@ -0,0 +1,24 @@
 # Supported Models
 | Model | Supported | Note |
 |---------|-----------|------|
 | Qwen 2.5 | ✅ ||
 | Mistral |  | Need test |
 | DeepSeek v2.5 | |Need test |
 | LLama3.1/3.2 | ✅ ||
 | Gemma-2 |  |Need test|
 | baichuan |  |Need test|
 | minicpm |  |Need test|
 | internlm | ✅ ||
 | ChatGLM | ✅ ||
 | InternVL 2.5 | ✅ ||
 | Qwen2-VL | ✅ ||
 | GLM-4v |  |Need test|
 | Molomo | ✅ ||
 | LLaVA 1.5 | ✅ ||
 | Mllama |  |Need test|
 | LLaVA-Next |  |Need test|
 | LLaVA-Next-Video |  |Need test|
 | Phi-3-Vison/Phi-3.5-Vison |  |Need test|
 | Ultravox |  |Need test|
 | Qwen2-Audio | ✅ ||