### What this PR does / why we need it? Add initial experimental support for Ascend 310P, this patch squash below PR into one to help validation: - https://github.com/vllm-project/vllm-ascend/pull/914 - https://github.com/vllm-project/vllm-ascend/pull/1318 - https://github.com/vllm-project/vllm-ascend/pull/1327 ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas 300I DUO series ### How was this patch tested? CI passed with: - E2E image build for 310P - CI test on A2 with e2e test and longterm test - Unit test missing because need a real 310P image to have the test, will add in a separate PR later. - Manually e2e test: - Qwen2.5-7b-instruct, Qwen2.5-0.5b, Qwen3-0.6B, Qwen3-4B, Qwen3-8B: https://github.com/vllm-project/vllm-ascend/pull/914#issuecomment-2942989322 - Pangu MGoE 72B The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended. #### ENV information CANN, NNAL version: 8.1.RC1 > [!IMPORTANT] > PTA 2.5.1 version >= torch_npu-2.5.1.post1.dev20250528 to support NZ format and calling NNAL operators on 310P #### Code example ##### Build vllm-ascend from source code ```shell # download source code as vllm-ascend cd vllm-ascend export SOC_VERSION=Ascend310P3 pip install -v -e . cd .. ``` ##### Run offline inference ```python from vllm import LLM, SamplingParams prompts = ["水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。", "水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。"] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10) # Create an LLM. llm = LLM( model="Qwen/Qwen2.5-7B-Instruct", max_model_len=4096, max_num_seqs=4, dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P disable_custom_all_reduce=True, trust_remote_code=True, tensor_parallel_size=2, compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]}, ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` --------- Signed-off-by: Vincent Yuan <farawayboat@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: Vincent Yuan <farawayboat@gmail.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>
111 lines
3.5 KiB
YAML
111 lines
3.5 KiB
YAML
name: 'image / Ubuntu'
|
|
# This is a docker build check and publish job:
|
|
# 1. PR Triggered docker image build check
|
|
# - is for image build check
|
|
# - Enable on main/*-dev branch
|
|
# - push: ${{ github.event_name != 'pull_request' }} ==> false
|
|
# 2. branches push trigger image publish
|
|
# - is for branch/dev/nightly image
|
|
# - commits are merge into main/*-dev ==> vllm-ascend:main / vllm-ascend:*-dev
|
|
# 3. tags push trigger image publish
|
|
# - is for final release image
|
|
# - Publish when tag with v* (pep440 version) ===> vllm-ascend:v1.2.3|latest / vllm-ascend:v1.2.3rc1
|
|
on:
|
|
pull_request:
|
|
branches:
|
|
- 'main'
|
|
- '*-dev'
|
|
paths:
|
|
- '.github/workflows/image_ubuntu.yml'
|
|
- 'Dockerfile'
|
|
- 'vllm_ascend/**'
|
|
- 'setup.py'
|
|
- 'pyproject.toml'
|
|
- 'requirements.txt'
|
|
- 'cmake/**'
|
|
- 'CMakeLists.txt'
|
|
- 'csrc/**'
|
|
push:
|
|
# Publish image when tagging, the Dockerfile in tag will be build as tag image
|
|
branches:
|
|
- 'main'
|
|
- '*-dev'
|
|
tags:
|
|
- 'v*'
|
|
paths:
|
|
- '.github/workflows/image_ubuntu.yml'
|
|
- 'Dockerfile'
|
|
- 'vllm_ascend/**'
|
|
jobs:
|
|
|
|
build:
|
|
name: vllm-ascend image build
|
|
runs-on: ubuntu-latest
|
|
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Print
|
|
run: |
|
|
lscpu
|
|
|
|
- name: Docker meta
|
|
id: meta
|
|
uses: docker/metadata-action@v5
|
|
with:
|
|
# TODO(yikun): add more hub image and a note on release policy for container image
|
|
images: |
|
|
quay.io/ascend/vllm-ascend
|
|
# Note for test case
|
|
# https://github.com/marketplace/actions/docker-metadata-action#typeref
|
|
# 1. branch job pulish per main/*-dev branch commits
|
|
# 2. main and dev pull_request is build only, so the tag pr-N is fine
|
|
# 3. only pep440 matched tag will be published:
|
|
# - v0.7.1 --> v0.7.1, latest
|
|
# - pre/post/dev: v0.7.1rc1/v0.7.1rc1/v0.7.1rc1.dev1/v0.7.1.post1, no latest
|
|
# which follow the rule from vLLM with prefix v
|
|
# TODO(yikun): the post release might be considered as latest release
|
|
tags: |
|
|
type=ref,event=branch
|
|
type=ref,event=pr
|
|
type=pep440,pattern={{raw}}
|
|
|
|
- name: Free up disk space
|
|
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
|
|
with:
|
|
tool-cache: true
|
|
docker-images: false
|
|
|
|
- name: Build - Set up QEMU
|
|
uses: docker/setup-qemu-action@v3
|
|
|
|
- name: Build - Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3
|
|
|
|
- name: Publish - Login to Quay Container Registry
|
|
if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
|
|
uses: docker/login-action@v3
|
|
with:
|
|
registry: quay.io
|
|
username: ${{ vars.QUAY_USERNAME }}
|
|
password: ${{ secrets.QUAY_PASSWORD }}
|
|
|
|
- name: Build and push 910b
|
|
uses: docker/build-push-action@v6
|
|
with:
|
|
platforms: >-
|
|
${{
|
|
github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
|
|
'linux/amd64,linux/arm64' ||
|
|
'linux/amd64'
|
|
}}
|
|
# use the current repo path as the build context, ensure .git is contained
|
|
context: .
|
|
file: Dockerfile
|
|
# only trigger when tag, branch/main push
|
|
push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
|
|
labels: ${{ steps.meta.outputs.labels }}
|
|
tags: ${{ steps.meta.outputs.tags }}
|
|
build-args: |
|
|
PIP_INDEX_URL=https://pypi.org/simple
|