xc-llm-ascend/README.zh.md at 46977f9f06976d5a765113656c5b3d5a95c3f485

Files

Yikun Jiang 46977f9f06 [Doc] Add sphinx build for vllm-ascend (#55 )

### What this PR does / why we need it?

This patch enables the doc build for vllm-ascend

- Add sphinx build for vllm-ascend
- Enable readthedocs for vllm-ascend
- Fix CI:
- exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree
to share your contact information to access this model` which introduce
in
314cfade02
- Install test req to fix
https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770:
      ```
      vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module>
          import pytest_asyncio
      E   ModuleNotFoundError: No module named 'pytest_asyncio'
      ```
  - exclude docs PR

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. test locally:
    ```bash
    # Install dependencies.
    pip install -r requirements-docs.txt
    
    # Build the docs and preview
    make clean; make html; python -m http.server -d build/html/
    ```
    
    Launch browser and open http://localhost:8000/.

2. CI passed with preview:
    https://vllm-ascend--55.org.readthedocs.build/en/55/

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

2025-02-13 18:44:17 +08:00

3.1 KiB

Raw Blame History

vLLM Ascend Plugin

| 关于昇腾 | 开发者 Slack (#sig-ascend) |

English | 中文

最新消息 🔥

[2024/12] 我们正在与 vLLM 社区合作，以支持 [RFC]: Hardware pluggable.

总览

vLLM 昇腾插件 (vllm-ascend) 是一个让vLLM在Ascend NPU无缝运行的后端插件。

此插件是 vLLM 社区中支持昇腾后端的推荐方式。它遵循[RFC]: Hardware pluggable所述原则：通过解耦的方式提供了vLLM对Ascend NPU的支持。

使用 vLLM 昇腾插件，可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。

准备

硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列
软件：
- Python >= 3.9
- CANN >= 8.0.RC2
- PyTorch >= 2.4.0, torch-npu >= 2.4.0
- vLLM (与vllm-ascend版本一致)

在此处，您可以了解如何逐步准备环境。

开始使用

Note

目前，我们正在积极与 vLLM 社区合作以支持 Ascend 后端插件，一旦支持，您可以使用一行命令: pip install vllm vllm-ascend 来完成安装。

通过源码安装:

# 安装vllm main 分支参考文档:
# https://docs.vllm.ai/en/latest/getting_started/installation/cpu/index.html#build-wheel-from-source
git clone --depth 1 https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements-build.txt
VLLM_TARGET_DEVICE=empty pip install .

# 安装vllm-ascend main 分支
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .

运行如下命令使用 Qwen/Qwen2.5-0.5B-Instruct 模型启动服务:

# 设置环境变量 VLLM_USE_MODELSCOPE=true 加速下载
vllm serve Qwen/Qwen2.5-0.5B-Instruct
curl http://localhost:8000/v1/models

请参阅官方文档以获取更多详细信息

贡献

有关更多详细信息，请参阅 CONTRIBUTING，可以更详细的帮助您部署开发环境、构建和测试。

我们欢迎并重视任何形式的贡献与合作：

您可以在这里反馈您的使用体验。
请通过提交问题来告知我们您遇到的任何错误。

许可证

Apache 许可证 2.0，如 LICENSE 文件中所示。

3.1 KiB Raw Blame History Unescape Escape

vLLM Ascend Plugin

总览

准备

开始使用

贡献

许可证

3.1 KiB

Raw Blame History