xc-llm-ascend/Dockerfile.a3

#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#

FROM quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11

ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
ARG COMPILE_CUSTOM_KERNELS=1
ARG MOONCAKE_TAG=v0.3.7.post2
ARG SOC_VERSION="ascend910_9391"

COPY . /vllm-workspace/vllm-ascend/
# Define environments
ENV DEBIAN_FRONTEND=noninteractive
ENV COMPILE_CUSTOM_KERNELS=${COMPILE_CUSTOM_KERNELS} \
    SOC_VERSION=$SOC_VERSION

RUN pip config set global.index-url ${PIP_INDEX_URL}

WORKDIR /workspace

# Install Mooncake dependencies
RUN apt-get update -y && \
    apt-get install -y git vim wget net-tools gcc g++ cmake libnuma-dev && \
    git clone --depth 1 --branch ${MOONCAKE_TAG} https://github.com/kvcache-ai/Mooncake /vllm-workspace/Mooncake && \
    cp /vllm-workspace/vllm-ascend/tools/mooncake_installer.sh /vllm-workspace/Mooncake/ && \
    cd /vllm-workspace/Mooncake && bash mooncake_installer.sh -y && \
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/lib64 && \
    mkdir -p build && cd build && cmake .. -DUSE_ASCEND_DIRECT=ON && \
    make -j$(nproc) && make install && \
    rm -fr /vllm-workspace/Mooncake/build && \
    rm -rf /var/cache/apt/* && \
    rm -rf /var/lib/apt/lists/*

# Install vLLM
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
ARG VLLM_TAG=v0.11.0
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm
# In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it.
RUN VLLM_TARGET_DEVICE="empty" python3 -m pip install -v -e /vllm-workspace/vllm/[audio] --extra-index https://download.pytorch.org/whl/cpu/ && \
    python3 -m pip uninstall -y triton && \
    python3 -m pip cache purge

# Install vllm-ascend
# Append `libascend_hal.so` path (devlib) to LD_LIBRARY_PATH
RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \
    source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
    source /usr/local/Ascend/nnal/atb/set_env.sh && \
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
    cd /vllm-workspace/vllm-ascend/csrc/vnpu_offload && \
    make install && make clean && \
    python3 -m pip install -v -e /vllm-workspace/vllm-ascend/ --extra-index https://download.pytorch.org/whl/cpu/ && \
    python3 -m pip cache purge

ENV VLLM_ASCEND_ENABLE_NZ=0 \
    VLLM_WORKER_MULTIPROC_METHOD=spawn \
    VLLM_ASCEND_ENABLE_VNPU=1

# Install modelscope (for fast download) and ray (for multinode)
RUN python3 -m pip install modelscope 'ray>=2.47.1,<=2.48.0' 'protobuf>3.20.0' && \
    python3 -m pip cache purge

CMD ["/bin/bash"]
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`#`
			`# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.`
			`#`
			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`
			`# This file is a part of the vllm-ascend project.`
			`#`

[v0.11.0]Upgrade cann to 8.3.rc2 (#4332) ### What this PR does / why we need it? Upgrade CANN to 8.3.rc2 Signed-off-by: MrZ20 <2609716663@qq.com> 2025-11-21 22:48:57 +08:00			`FROM quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00
			`ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"`
			`ARG COMPILE_CUSTOM_KERNELS=1`
[Image][Build] Cherry pick #4062 from main (#4506) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images Signed-off-by: wangli <wangli858794774@gmail.com> 2025-12-01 11:39:40 +08:00			`ARG MOONCAKE_TAG=v0.3.7.post2`
update other platforms' Dockerfile 2026-01-22 12:07:03 +00:00			`ARG SOC_VERSION="ascend910_9391"`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00
[Image][Build] Cherry pick #4062 from main (#4506) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images Signed-off-by: wangli <wangli858794774@gmail.com> 2025-12-01 11:39:40 +08:00			`COPY . /vllm-workspace/vllm-ascend/`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`# Define environments`
			`ENV DEBIAN_FRONTEND=noninteractive`
update other platforms' Dockerfile 2026-01-22 12:07:03 +00:00			`ENV COMPILE_CUSTOM_KERNELS=${COMPILE_CUSTOM_KERNELS} \`
			`SOC_VERSION=$SOC_VERSION`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00
[Image][Build] Cherry pick #4062 from main (#4506) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images Signed-off-by: wangli <wangli858794774@gmail.com> 2025-12-01 11:39:40 +08:00			`RUN pip config set global.index-url ${PIP_INDEX_URL}`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00
			`WORKDIR /workspace`

[Image][Build] Cherry pick #4062 from main (#4506) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images Signed-off-by: wangli <wangli858794774@gmail.com> 2025-12-01 11:39:40 +08:00			`# Install Mooncake dependencies`
			`RUN apt-get update -y && \`
			`apt-get install -y git vim wget net-tools gcc g++ cmake libnuma-dev && \`
			`git clone --depth 1 --branch ${MOONCAKE_TAG} https://github.com/kvcache-ai/Mooncake /vllm-workspace/Mooncake && \`
			`cp /vllm-workspace/vllm-ascend/tools/mooncake_installer.sh /vllm-workspace/Mooncake/ && \`
			`cd /vllm-workspace/Mooncake && bash mooncake_installer.sh -y && \`
			export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/lib64 && \
			`mkdir -p build && cd build && cmake .. -DUSE_ASCEND_DIRECT=ON && \`
			`make -j$(nproc) && make install && \`
			`rm -fr /vllm-workspace/Mooncake/build && \`
			`rm -rf /var/cache/apt/* && \`
			`rm -rf /var/lib/apt/lists/*`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00
			`# Install vLLM`
			`ARG VLLM_REPO=https://github.com/vllm-project/vllm.git`
[CI] Update vLLM to v0.11.0 (#3315) ### What this PR does / why we need it? There are 3 step to upgrade vllm-ascend to newest vllm. We'll create 3 PR - [x] Upgrade vllm to v0.11.0 to make CI happy first . - [ ] Move deepseek v3.2 to vllm way - [ ] Then we'll add a new PR to add vllm main support. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-10-09 10:41:19 +08:00			`ARG VLLM_TAG=v0.11.0`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm`
			`# In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it.`
[CI] Defaultly compile vllm with multimodal audio feature in dockerfile (#4324) (#4341) ### What this PR does / why we need it? For better usability, add multimodal audio to vllm compiling in dockerfile defaultly. Image size will increase only 2.xM. Signed-off-by: Ting FU <futing10@huawei.com> 2025-11-21 17:53:00 +08:00			`RUN VLLM_TARGET_DEVICE="empty" python3 -m pip install -v -e /vllm-workspace/vllm/[audio] --extra-index https://download.pytorch.org/whl/cpu/ && \`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`python3 -m pip uninstall -y triton && \`
			`python3 -m pip cache purge`

			`# Install vllm-ascend`
			# Append `libascend_hal.so` path (devlib) to LD_LIBRARY_PATH
			`RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \`
			`source /usr/local/Ascend/ascend-toolkit/set_env.sh && \`
			`source /usr/local/Ascend/nnal/atb/set_env.sh && \`
			export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
add env vars & misc 2026-02-11 06:27:58 +00:00			`cd /vllm-workspace/vllm-ascend/csrc/vnpu_offload && \`
update other platforms' Dockerfile 2026-01-22 12:07:03 +00:00			`make install && make clean && \`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`python3 -m pip install -v -e /vllm-workspace/vllm-ascend/ --extra-index https://download.pytorch.org/whl/cpu/ && \`
			`python3 -m pip cache purge`

update other platforms' Dockerfile 2026-01-22 12:07:03 +00:00			`ENV VLLM_ASCEND_ENABLE_NZ=0 \`
			`VLLM_WORKER_MULTIPROC_METHOD=spawn \`
add env vars & misc 2026-02-11 06:27:58 +00:00			`VLLM_ASCEND_ENABLE_VNPU=1`
update other platforms' Dockerfile 2026-01-22 12:07:03 +00:00
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`# Install modelscope (for fast download) and ray (for multinode)`
[Image][Build] Cherry pick #4062 from main (#4506) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images Signed-off-by: wangli <wangli858794774@gmail.com> 2025-12-01 11:39:40 +08:00			`RUN python3 -m pip install modelscope 'ray>=2.47.1,<=2.48.0' 'protobuf>3.20.0' && \`
[Platform] Add support for Altlas A3 series (#1794) ### What this PR does / why we need it? Add support for Ascend A3 and remove latest tag ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas A3 series ### How was this patch tested? CI passed with: - remove latest tag test: https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765 - E2E image build for A3 - CI test on A3 with e2e test and longterm test - Unit test missing because need a real A3 hardware to have a test Closes: https://github.com/vllm-project/vllm-ascend/issues/1696 - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/d0dc4cfca48c2734da18ec42d6bba1346cbfc400 --------- Signed-off-by: Icey <1790571317@qq.com> 2025-07-17 11:13:02 +08:00			`python3 -m pip cache purge`

[v0.11.0]Upgrade cann to 8.3.rc2 (#4332) ### What this PR does / why we need it? Upgrade CANN to 8.3.rc2 Signed-off-by: MrZ20 <2609716663@qq.com> 2025-11-21 22:48:57 +08:00			`CMD ["/bin/bash"]`