[Doc] Update feature support list (#650)

1. remove Chinese doc. The content is out of data and we don't have
enough time to maintain it.
2. Update feature support matrix. Refresh the content and add V1 status.

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
wangxiyuan
2025-04-26 10:27:29 +08:00
committed by GitHub
parent 3879d9cad9
commit c99c4c8c70
3 changed files with 42 additions and 200 deletions

View File

@@ -1,102 +0,0 @@
# 贡献指南
## 构建与测试
我们推荐您在提交PR之前在本地开发环境进行构建和测试。
### 环境准备与构建
理论上vllm-ascend 构建仅支持 Linux因为`vllm-ascend` 依赖项 `torch_npu` 仅支持 Linux。
但是您仍然可以在 Linux/Windows/macOS 上配置开发环境进行代码检查和基本测试,如下命令所示:
```bash
# 选择基础文件夹 (~/vllm-project/) 创建python虚拟环境
cd ~/vllm-project/
python3 -m venv .venv
source ./.venv/bin/activate
# 克隆并安装vllm
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements/build.txt
VLLM_TARGET_DEVICE="empty" pip install .
cd ..
# 克隆并安装vllm-ascend
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -r requirements-dev.txt
# 通过执行以下脚本以运行 lint 及 mypy 测试
bash format.sh
# 构建:
# - 目前仅支持在Linux上进行完整构建torch_npu 限制)
# pip install -e .
# - 在其他操作系统上构建安装,需要跳过依赖
# - build without deps for debugging in other OS
# pip install -e . --no-deps
# 使用 `-s` 提交更改
git commit -sm "your commit info"
```
### 测试
虽然 vllm-ascend CI 提供了对 [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) 的集成测试,但您也可以在本地运行它。在本地运行这些集成测试的最简单方法是通过容器:
```bash
# 基于昇腾NPU环境
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
export IMAGE=vllm-ascend-dev-image
export CONTAINER_NAME=vllm-ascend-dev
export DEVICE=/dev/davinci1
# 首次构建会花费10分钟10MB/s下载基础镜像和包
docker build -t $IMAGE -f ./Dockerfile .
# 您还可以通过设置 VLLM_REPO 来指定镜像仓库以加速
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm \
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-ti $IMAGE bash
cd vllm-ascend
pip install -r requirements-dev.txt
pytest tests/
```
## 开发者来源证书(DCO)
在向本项目提交贡献时,您必须同意 DCO。提交必须包含“Signed-off-by:”标头,以证明同意 DCO 的条款。
`git commit`时使用`-s`将会自动添加该标头。
## PR 标题和分类
仅特定类型的 PR 会被审核。PR 标题会以适当的前缀来表明变更类型。请使用以下之一:
- `[Attention]` 关于`attention`的新特性或优化
- `[Communicator]` 关于`communicators`的新特性或优化
- `[ModelRunner]` 关于`model runner`的新特性或优化
- `[Platform]` 关于`platform`的新特性或优化
- `[Worker]` 关于`worker`的新特性或优化
- `[Core]` 关于`vllm-ascend`核心逻辑 (如 `platform, attention, communicators, model runner`)的新特性或优化
- `[Kernel]` 影响计算内核和操作的更改.
- `[Bugfix]` bug修复
- `[Doc]` 文档的修复与更新
- `[Test]` 测试 (如:单元测试)
- `[CI]` 构建或持续集成改进
- `[Misc]` 适用于更改内容对于上述类别均不适用的PR请谨慎使用该前缀
> [!注意]
> 如果 PR 涉及多个类别,请添加所有相关前缀
## 其他
您可以在 [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html) 上找到更多有关为 vLLM 昇腾插件贡献的信息。
如果您在贡献过程中发现任何问题,您可以随时提交 PR 来改进文档以帮助其他开发人员。

View File

@@ -1,79 +0,0 @@
# 版本策略
从vLLM的0.7.x版本开始vLLM Ascend Plugin ([vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend)) 整体遵循[PEP 440](https://peps.python.org/pep-0440/)的版本策略与vLLM ([vllm-project/vllm](https://github.com/vllm-project/vllm)) 配套发布。
## vLLM Ascend Plugin版本
vllm-ascend的版本号为`v[major].[minor].[micro][rcN][.postN]`(比如`v0.7.1rc1`, `v0.7.1`, `v0.7.1.post1`
- **Final releases (正式版)**: 通常3个月发布一次正式版将会综合考虑vLLM上游发布及昇腾产品软件发布策略。
- **Pre releases (尝鲜版)**: 通常为按需发布以rcN结尾代表第N个Release Candidate版本提供在final release之前的尝鲜版早期试用版
- **Post releases (补丁版)**: 通常在final release发布后按需发布主要是修复最终版本的错误。这个策略与[PEP-440提到的策略](https://peps.python.org/pep-0440/#post-releases)有所不同它会包含实际的bug修复考虑到正式版与vLLM的版本`v[major].[minor].[micro]`配套发布。因此Post releases通常是Final release的补丁版本。
例如:
- `v0.7.x`: 是配套 vLLM `v0.7.x` 版本的正式版。
- `v0.7.1rc1`: 是vllm-ascend第一个尝鲜版早期试用版
- `v0.7.1.post1`: 是`v0.7.1`版本的post release。
## 分支管理策略
vllm-ascend有主干和开发两种分支。
- **main**: 主干分支与vLLM的主干分支对应并通过昇腾CI持续进行质量看护。
- **vX.Y.Z-dev**: 开发分支随vLLM部分新版本发布而创建比如`v0.7.1-dev`是vllm-ascend针对vLLM `v0.7.1`版本的开发分支。
通常一个commit需要先合入到主干分支然后再反合到开发分支从而尽可能地减少版本维护成本。
### 分支维护和EOL
某个分支的状态将会以下三种之一:
| 分支 | 维护时间 | 备注 |
|-------------------|----------------------------|----------------------------------------------------------------------|
| Maintained (维护中) | 大概2-3个minor版本 | 合入所有已解决的问题发布版本CI保证 |
| Unmaintained (无维护) | 社区诉求/兴趣驱动 | 合入所有已解决的问题无版本发布无CI承诺 |
| End of Life (EOL, 生命周期终止) | 无 | 分支不再接受任何代码 |
### 分支状态
注意:对于`*-dev`分支vllm-ascend将仅针对 vLLM 某个特定版本创建开发分支,而非全量版本。 因此您可能看到部分vLLM版本没有对应的开发分支比如只能看到`0.7.1-dev` / `0.7.3-dev`分支,而没有`0.7.2-dev`分支),这是符合预期的。
通常来说vLLM每个minor版本比如0.7均会对应一个vllm-ascend版本分支并支持其最新的版本例如我们计划支持0.7.3版本)。如下所示:
| 分支 | 状态 | 备注 |
|------------|--------------|---------------------|
| main | Maintained | 基于vLLM main分支CI看护 |
| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护 |
| v0.7.1-dev | Unmaintained | 被v0.7.3-dev分支代替 |
## 文档分支管理策略
为了减少维护成本,**所有分支文档内容应当保持一致,版本的差异可以通过[docs/source/conf.py](https://github.com/vllm-project/vllm-ascend/blob/main/docs/source/conf.py)中的变量控制**,这不是一件简单的事情,但这是我们应该努力遵循的原则。
| 版本 | 定位 | 代码分支 |
|-----|-----|---------|
| latest | 最新开发分支文档 | `vX.Y.Z-dev` (首个版本发布后为`main` |
| version | 历史发布版本的文档 | `vX.Y.Z[rcN]`等git tag |
| stable尚未发布 | 最新正式版分支文档 | 首个版本发布后为`vX.Y.Z-dev` |
如上所示:
- `latest`文档:最新发布版维护分支的文档,匹配当前维护分支`vX.Y.Z-dev`的文档(当首个正式版发布后,会更换为`main`分支),持续更新,保证最新发布版的文档持续可用。
- `version`文档发布版本的文档对应版本为vX.Y.Z[rcN](例如`v0.7.3`, `v0.7.3rc1`),版本发布后不会再更新。
- `stable`文档(尚未发布):正式版文档,**文档允许在release后实时更新**,通常是`vX.Y.Z-dev`。有稳定文档后,在非稳定版的文档顶部,应该提示: `您正在查看最新的开发者预览版文档。单击此处查看最新稳定版本的文档。`
## 版本配套
vLLM Ascend Plugin (`vllm-ascend`) 的关键配套关系如下:
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|--------------|---------| --- | --- | --- |
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
## 发布节奏
### 下一个正式版(`v0.7.x`)发布窗口
| 时间 | 事件 |
|----------|-------------------------------|
| 2025年03月 | RC版本, v0.7.3rc1 |
| 2025年03月 | 正式版, 匹配0.7.3最新的vLLM版本: v0.7.3 |

View File

@@ -1,21 +1,44 @@
# Feature Support
| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
| Chunked Prefill | ❌ | | | NA | Plan in 2025.03.30 |
| Automatic Prefix Caching | ❌ | | | NA | Plan in 2025.03.30 |
| LoRA | ❌ | | | NA | Plan in 2025.06.30 |
| Prompt adapter | ❌ | | | NA | Plan in 2025.06.30 |
| Speculative decoding | | | | Basic functions available | Need fully test |
| Pooling | ✅ | | | Basic functions available(Bert) | Need fully test and add more models support|
| Enc-dec | | | | NA | Plan in 2025.06.30|
| Multi Modality | ✅ | | | Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve performance, and add more models support |
| LogProbs | | | | Basic functions available | Need fully test |
| Prompt logProbs | ✅ | | | Basic functions available | Need fully test |
| Async output | | | | Basic functions available | Need fully test |
| Multi step scheduler | ✅ | | | Basic functions available | Need fully test, Find more details at [<u> Blog </u>](https://blog.vllm.ai/2024/09/05/perf-update.html#batch-scheduling-multiple-steps-ahead-pr-7000), [<u> RFC </u>](https://github.com/vllm-project/vllm/issues/6854) and [<u>issue</u>](https://github.com/vllm-project/vllm/pull/7000) |
| Best of | ✅ | | | Basic functions available | Need fully test |
| Beam search | | | | Basic functions available | Need fully test |
| Guided Decoding | | | | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
| Tensor Parallel | ✅ | | | Basic functions available | Need fully test |
| Pipeline Parallel | | | | Basic functions available | Need fully test |
The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is the feature support status of vLLM Ascend:
| Feature | vLLM V0 Engine | vLLM V1 Engine | Next Step |
|-------------------------------|----------------|----------------|------------------------------------------------------------------------|
| Chunked Prefill | 🚧 WIP | 🚧 WIP | Functional, waiting for CANN 8.1 nnal package release |
| Automatic Prefix Caching | 🚧 WIP | 🚧 WIP | Functional, waiting for CANN 8.1 nnal package release |
| LoRA | 🟢 Functional | 🚧 WIP | [vllm-ascend#396][multilora], CI needed, working on V1 support |
| Prompt adapter | No plan | 🟡 Planned | Plan in 2025.06.30 |
| Speculative decoding | 🟢 Functional | 🚧 WIP | CI needed; working on V1 support |
| Pooling | 🟢 Functional | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
| Enc-dec | 🔴 NO plan | 🟡 Planned | Plan in 2025.06.30 |
| Multi Modality | 🟢 Functional | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models |
| LogProbs | 🟢 Functional | 🟢 Functional | CI needed |
| Prompt logProbs | 🟢 Functional | 🟢 Functional | CI needed |
| Async output | 🟢 Functional | 🟢 Functional | CI needed |
| Multi step scheduler | 🟢 Functional | 🔴 Deprecated | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler]) |
| Best of | 🟢 Functional | 🔴 Deprecated | [vllm#13361][best_of], CI needed |
| Beam search | 🟢 Functional | 🟢 Functional | CI needed |
| Guided Decoding | 🟢 Functional | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
| Tensor Parallel | 🟢 Functional | 🟢 Functional | CI needed |
| Pipeline Parallel | 🟢 Functional | 🟢 Functional | CI needed |
| Expert Parallel | 🔴 NO plan | 🟢 Functional | CI needed; No plan on V0 support |
| Data Parallel | 🔴 NO plan | 🟢 Functional | CI needed; No plan on V0 support |
| Prefill Decode Disaggregation | 🟢 Functional | 🟢 Functional | 1P1D available, working on xPyD and V1 support. |
| Quantization | 🟢 Functional | 🟢 Functional | W8A8 available, CI needed; working on more quantization method support |
| Graph Mode | 🔴 NO plan | 🟢 Functional | Functional, waiting for CANN 8.1 nnal package release |
| Sleep Mode | 🟢 Functional | 🟢 Functional | level=1 available, CI needed, working on V1 support |
- 🟢 Functional: Fully operational, with ongoing optimizations.
- 🚧 WIP: Under active development
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
- 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1.
[v1_user_guide]: https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
[multimodal]: https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
[best_of]: https://github.com/vllm-project/vllm/issues/13361
[guided_decoding]: https://github.com/vllm-project/vllm-ascend/issues/177
[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py
[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779
[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396