xc-llm-ascend

Author	SHA1	Message	Date
zhangxinyuehfad	67d40f23fd	[CI]Upgrade niglty multi-node-tests max-parallel to 2 (#7035 ) ### What this PR does / why we need it? 1. Increase nightly multi-node test max-parallel from 1 to 2, and fix resource conflicts that arise when tests run concurrently. 2. Fix parse-trigger job: Add an if condition so it only runs on schedule, workflow_dispatch, or PRs labeled nightly-test 3. Adjust nightly schedule: Shift trigger time from 24:00 to 23:45 (UTC+8) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-10 16:25:51 +08:00
dependabot[bot]	3b25ded8b7	[CI] Bump docker/metadata-action from 5 to 6 (#7069 ) Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-10 09:06:04 +08:00
dependabot[bot]	2325bbe79b	[CI] Bump actions/checkout from 4 to 6 (#7070 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-10 09:05:22 +08:00
wanghengkang	c49ce18ea5	[Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977 ) ### What this PR does / why we need it? Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: gcw_61wqY8cy <wanghengkang1@huawei.com>	2026-03-06 14:25:10 +08:00
SILONG ZENG	bd571cf6d6	[Main2Main] Upgrade vLLM to 0303 (#6944 ) ### What this PR does / why we need it? break: - https://github.com/vllm-project/vllm/pull/34102 Disable_full param replaced with valid_modes/invalid_modes API - https://github.com/vllm-project/vllm/pull/35503 Now must return float compilation_time - https://github.com/vllm-project/vllm/pull/35564 New sequence_lengths param added - https://github.com/vllm-project/vllm/pull/33807 A check was performed (if runner_backend != "auto") - https://github.com/vllm-project/vllm/pull/34861 `BaseDeviceCommunicator` now accesses PyTorch's internal `pg_map` to check process group state - https://github.com/vllm-project/vllm/pull/35274 Important change: - https://github.com/vllm-project/vllm/pull/28672 `matcher_utils` directly accesses `torch.ops._C.*` during the import phase. In the Ascend environment, some unregistered ops trigger `AttributeError`, causing e2e initialization failure. https://github.com/vllm-project/vllm-ascend/actions/runs/22607260487/job/65502047131#step:10:2323 https://github.com/vllm-project/vllm/blob/main/vllm/compilation/passes/fusion/matcher_utils.py#L29 This PR adds temporary compatibility placeholders (rms_norm, fused_add_rms_norm, rotate_embedding, static/dynamic fp8 quant, silu_and_mul) to `vllm_ascend/patch/platform/patch_fusion_matcher_compat_ops.py` to ensure no crashes during the import phase. Upstream repairs will be considered later. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>	2026-03-06 09:08:52 +08:00
zhangxinyuehfad	1e4017e3fa	[CI] support nightly ci for per pr by labels (#6483 ) ### What this PR does / why we need it? This PR refactors the nightly CI workflows (A2 and A3) to support running tests against a specific PR's code, in addition to the existing scheduled/dispatch runs using pre-built images. #### Motivation: Previously, nightly tests could only be triggered by schedule or workflow_dispatch, always using the pre-built nightly image. This change allows developers to trigger nightly tests against their own PR's source code, enabling early validation without waiting for a nightly build. #### Changes Trigger logic (parse-trigger job) A new parse-trigger job is introduced in both schedule_nightly_test_a2.yaml and schedule_nightly_test_a3.yaml to centralize trigger evaluation: `schedule / workflow_dispatch`: runs all tests with the pre-built image (existing behavior preserved) `pull_request (labeled + synchronize)`: runs only when:The PR has the nightly-test label, and /nightly [test-names] comment exists (latest one wins) 1. /nightly or /nightly all — runs all tests 2. /nightly test1 test2 — runs only named tests (comma-wrapped for exact matching) #### How to trigger 1. Add the nightly-test label to your PR 2. Comment /nightly (all tests) or /nightly test1 test2 (specific tests) 4. Re-triggering: add another /nightly comment and push a new commit (synchronize event) ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-05 16:46:37 +08:00
zhangxinyuehfad	566c367a10	[CI] Add DeepSeek-V3.2 large EP nightly ci (#6378 ) ### What this PR does / why we need it? Add DeepSeek-V3.2 nightly ci Fix PD routing to exclude headless nodes when collecting prefiller/decoder IPs - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-04 16:15:56 +08:00
SILONG ZENG	95b44d7b73	[bugfix]fix file not found error in nightly of single-node (#6976 ) ### What this PR does / why we need it? 1. The main image build takes approximately two hours. The main image build time needs to be moved forward to 21pm(UTC+8) to ensure that the nightly image build can use the latest main image. ``` bash schedule: # UTC+8: 8am, 12pm, 16pm, 22pm - cron: '0 0,4,8,14 * * ' ``` ---> ``` bash schedule: # UTC+8: 8am, 12pm, 16pm, 21pm - cron: '0 0,4,8,13 * *' ``` Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641055135#step:8:26 2. The nightly test is encountering the following error: ``` bash ImportError: ascend_transport.so: cannot open shared object file: No such file or directory. ``` Path need to be added： ``` bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc ``` Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641054911#step:7:529 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-03-04 11:47:26 +08:00
Li Wang	d431d7d526	[CI] Enable auto upgrade e2e estimated time for auto-partition suites (#6840 ) ### What this PR does / why we need it? This patch add a schedule triggered workflow for auto upgrade e2e estimated-time for batter load balance 1. The workflow will run the full e2e test to get the duration of each test. 2. The script `update_estimated_time.py` will upgrade the [config.json](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/scripts/config.yaml) according to the latest time 3. The workflow will submit a pull request that includes changes to `config.json` automatically <img width="2484" height="764" alt="image" src="https://github.com/user-attachments/assets/02f3459c-bb3b-4f8e-9966-8bb2e5c1bbea" /> ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` - ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-04 10:38:34 +08:00
SILONG ZENG	859f2c25b9	[Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (#6503 ) ### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. \| Original PR \| Python (`.py`) \| YAML (`.yaml`) \| \| :--- \| :--- \| :--- \| \| [#3568](https://github.com/vllm-project/vllm-ascend/pull/3568) \| `test_deepseek_r1_0528_w8a8_eplb.py` \| `DeepSeek-R1-0528-W8A8.yaml` \| \| [#3631](https://github.com/vllm-project/vllm-ascend/pull/3631) \| `test_deepseek_r1_0528_w8a8.py` \| `DeepSeek-R1-0528-W8A8.yaml` \| \| [#5874](https://github.com/vllm-project/vllm-ascend/pull/5874) \| `test_deepseek_r1_w8a8_hbm.py` \| `DeepSeek-R1-W8A8-HBM.yaml` \| \| [#3908](https://github.com/vllm-project/vllm-ascend/pull/3908) \| `test_deepseek_v3_2_w8a8.py` \| `DeepSeek-V3.2-W8A8.yaml` \| \| [#5682](https://github.com/vllm-project/vllm-ascend/pull/5682) \| `test_kimi_k2_thinking.py` \| `Kimi-K2-Thinking.yaml` \| \| [#4111](https://github.com/vllm-project/vllm-ascend/pull/4111) \| `test_mtpx_deepseek_r1_0528_w8a8.py` \| `MTPX-DeepSeek-R1-0528-W8A8.yaml` \| \| [#3733](https://github.com/vllm-project/vllm-ascend/pull/3733) \| `test_prefix_cache_deepseek_r1_0528_w8a8.py` \| `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` \| \| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) \| `test_qwen3_235b_w8a8.py` \| `Qwen3-235B-A22B-W8A8.yaml` \| \| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) \| `test_qwen3_235b_a22b_w8a8_eplb.py` \| `Qwen3-235B-A22B-W8A8.yaml` \| \| [#3973](https://github.com/vllm-project/vllm-ascend/pull/3973) \| `test_qwen3_30b_w8a8.py` \| `Qwen3-30B-A3B-W8A8.yaml` \| \| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) \| `test_qwen3_32b_int8.py` \| `Qwen3-32B-Int8.yaml` \| \| [#3757](https://github.com/vllm-project/vllm-ascend/pull/3757) \| `test_qwq_32b.py` \| `QwQ-32B.yaml` \| \| [#5616](https://github.com/vllm-project/vllm-ascend/pull/5616) \| `test_qwen3_next_w8a8.py` \| `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` \| \| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) \| `test_qwen2_5_vl_7b.py` \| `Qwen2.5-VL-7B-Instruct.yaml` \| \| [#5301](https://github.com/vllm-project/vllm-ascend/pull/5301) \| `test_qwen2_5_vl_7b_epd.py` \| `Qwen2.5-VL-7B-Instruct-EPD.yaml` \| \| [#3707](https://github.com/vllm-project/vllm-ascend/pull/3707) \| `test_qwen2_5_vl_32b.py` \| `Qwen2.5-VL-32B-Instruct.yaml` \| \| [#3676](https://github.com/vllm-project/vllm-ascend/pull/3676) \| `test_qwen3_32b_int8_a3_feature_stack3.py` \| `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` \| \| [#3709](https://github.com/vllm-project/vllm-ascend/pull/3709) \| `test_prefix_cache_qwen3_32b_int8.py` \| `Prefix-Cache-Qwen3-32B-Int8.yaml` \| \| [#5395](https://github.com/vllm-project/vllm-ascend/pull/5395) \| `test_qwen3_next.py` \| `Qwen3-Next-80B-A3B-Instruct-A2.yaml` \| \| [#3474](https://github.com/vllm-project/vllm-ascend/pull/3474) \| `test_qwen3_32b.py` \| `Qwen3-32B.yaml` \| \| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) \| `test_qwen3_32b_int8.py` \| `Qwen3-32B-Int8-A2.yaml` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-03-03 20:13:43 +08:00
Xiaoshuang Wang	f7a8befc20	[CI] Upgrade CANN to 8.5.1 (#6897 ) ### What this PR does / why we need it? [CI] Upgrade CANN to 8.5.1 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: wxsIcey <1790571317@qq.com>	2026-03-03 09:02:42 +08:00
pu-zhe	632801b0ad	[CI][310P] Add 310p tracked files in CI light. (#6923 ) ### What this PR does / why we need it? Add 310p tracked files in CI light. 'vllm_ascend/attention/attention_v1.py' 'vllm_ascend/ops/fused_moe/**' ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: pu-zhe <zpuaa@outlook.com>	2026-03-02 18:03:46 +08:00
dependabot[bot]	86c9109d16	Bump actions/upload-artifact from 6 to 7 (#6906 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-02 14:08:28 +08:00
dependabot[bot]	002ec24dd8	Bump actions/download-artifact from 7 to 8 (#6907 ) Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-02 14:07:59 +08:00
wjunLu	c324053b44	[CI] Revert speedup image building and CI Installation related PRs (#6891 ) ### What this PR does / why we need it? Revert speedup image building and CI Installation related PRs git revert `8835236181` git revert `64fba51275` git revert `263c2f8e8d` git revert `84b00695f8` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: wjunLu <wjunlu217@gmail.com>	2026-03-02 08:53:10 +08:00
wjunLu	8835236181	[Image] Fix docker image merge tag settings (#6884 ) ### What this PR does / why we need it? Fix docker image merge tag settings, to use tag as the branch name. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: wjunLu <wjunlu217@gmail.com>	2026-03-01 12:20:57 +08:00
wjunLu	64fba51275	[Bugfix] Fix openEuler dockerfile error (#6871 ) ### What this PR does / why we need it? This pull request addresses several issues within the openEuler Dockerfiles : `Dockerfile.a3.openEuler` and `Dockerfile.openEuler` to ensure correct installation and setup of the Mooncake dependency. The changes primarily involve fixing incorrect file paths for the mooncake_installer.sh script and streamlining the declaration of the MOONCAKE_TAG build argument, leading to more robust and accurate container builds. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: wjunLu <wjunlu217@gmail.com>	2026-02-28 20:55:18 +08:00
wjunLu	263c2f8e8d	[CI] Revert auto rebase (#6867 ) ### What this PR does / why we need it? Revert auto rebase - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: wjunLu <wjunlu217@gmail.com>	2026-02-28 11:54:31 +08:00
wjunLu	84b00695f8	[CI] Refactor to speedup image building and CI Installation (#6708 ) ### What this PR does / why we need it? 1. Refactor image workflow using cache-from to speedup builds ![build](https://github.com/user-attachments/assets/02135c12-0069-44f8-a3ec-5c2b4282448a) Simultaneously refactored all Dockerfiles by placing layers that rarely change before those that change frequently, improving build cache hit rate. 2. Refactor E2E test using vllm-ascend container images, to skip C compile while no C code are changed ![e2e](https://github.com/user-attachments/assets/49f5b166-0df3-41e1-8f71-b3bbbed17cfd) In this case, the job will only replace the source code of vllm-ascend and install `requirements-dev.txt`, saving about 10min before tests ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` Signed-off-by: wjunLu <wjunlu217@gmail.com>	2026-02-28 09:06:00 +08:00
wjunLu	b60b991005	[CI] Add nightly test for Qwen3-235B-A22B with mooncake layerwise connector (#5441 ) ### What this PR does / why we need it? Add nightly test for Qwen3-235B-A22B with mooncake layerwise connector. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: `81786c8774` --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>	2026-02-27 16:31:02 +08:00
Canlin Guo	e4458b2d2b	[Main2Main] Upgrade vLLM to 0226 (#6813 ) ### What this PR does / why we need it? Breaking: 1. https://github.com/vllm-project/vllm/pull/33452 2. https://github.com/vllm-project/vllm/pull/33451 3. https://github.com/vllm-project/vllm/pull/32567 4. https://github.com/vllm-project/vllm/pull/32344 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: MrZ20 <2609716663@qq.com>	2026-02-27 16:05:21 +08:00
realliujiaxu	5def28dcd3	[Feat]support sequence parallelism by pass for VL models (#5632 )	2026-02-27 08:27:41 +08:00
Li-Yongwen	2870f7c8ad	[Feat] Support routing replay (#6696 ) ### What this PR does / why we need it? [Feat] Support routing replay same as https://github.com/vllm-project/vllm-ascend/pull/6666 resubmit because of DOC failure ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: liyongwen <1310439159@qq.com> Signed-off-by: Li-Yongwen <63399187+Li-Yongwen@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-26 10:22:47 +08:00
bowenli	e3927cc8f5	[Bugfix] fix bug for mtp (#6514 ) ### What this PR does / why we need it? fix(mtp): resolve MTP core bugs and enhance eager mode test cases 1. Resolved critical issues in eager mode MTP core execution logic; 2. Fixed functional bugs in the _update_states_after_model_execute function; 3. Updated and released test_mtp_qwen3_next.py to validate eager mode acceptance rate. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: Bowen-Leee <caoshankuangren@gmail.com>	2026-02-25 17:50:57 +08:00
Icey	ee59429015	upgrade main to 0212 (#6712 ) ### What this PR does / why we need it? Fixes `transformers_utils/processors/__init__` import error, due to https://github.com/vllm-project/vllm/pull/33247 Fixes Fused MoE break introduced by `MoERunner abstraction,` due to https://github.com/vllm-project/vllm/pull/32344 > delete AscendMoERunnere when https://github.com/vllm-project/vllm/pull/35178 is merged Fixes `Make Qwen3VL compatible with Transformers v5`, due to https://github.com/vllm-project/vllm/pull/34262 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: wxsIcey <1790571317@qq.com>	2026-02-25 09:17:29 +08:00
Nengjun Ma	f0caeeadcb	[CI] unlock when load model (#6771 ) ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-02-14 18:54:04 +08:00
SILONG ZENG	e2237819a9	[CI]Fixed the spell check function in `typos.toml` (#6753 ) ### What this PR does / why we need it? The incorrect regular expression syntax `.[UE4M3\|ue4m3].` actually ignores all words containing any of the following characters: `u, e, 4, m, 3, \|` ```yaml extend-ignore-identifiers-re = [".Unc.", "._thw", ".UE8M0.", ".[UE4M3\|ue4m3].", ".eles.", ".fo.", ".ba.", ".ot.", ".[Tt]h[rR]."] ``` ===fix===> ```yaml extend-ignore-identifiers-re = [".Unc.", "._thw", ".UE8M0.", ".(UE4M3\|ue4m3]).", ".eles.", ".fo.", ".ba.", ".ot.", ".[Tt]h[rR]."] ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `9562912cea` Signed-off-by: MrZ20 <2609716663@qq.com>	2026-02-14 11:57:26 +08:00
jiahao.quan	7221045777	[Attention] add gpt-oss support (#5901 ) ### What this PR does / why we need it? Please refer to the following link for the historical conversation https://github.com/vllm-project/vllm-ascend/pull/4467. We have made updates in light of the comments from the prior PR review. Given the refactoring of the attention_v1 component, we have carried out necessary adjustments to fit the newly revised code. ### Does this PR introduce _any_ user-facing change? 1. Modified the code in the Attention section to adapt to the SWA and Sink features required by gpt-oss. 2. Modified the code in the MoE section to add support for bias and swigluoai. ### How was this patch tested? Please refer to the https://github.com/vllm-project/vllm-ascend/pull/4467 for performance tests, on the basis of which the accuracy tests from AIME2024 have been newly added. ![img_v3_02tu_501e88e3-2217-4565-8edf-b9acf4f43f2g](https://github.com/user-attachments/assets/024f8283-18ab-4d4d-ab12-27917b5d7d06) - vLLM version: v0.13.0 - vLLM main: `bde38c11df` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: mikequan0425 <mikequan0425@foxmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: pu-zhe <zpuaa@outlook.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: luomin2005 <luomin2005@huawei.com> Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: SlightwindSec <slightwindsec@gmail.com> Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leon_tao <taoyao2@huawei.com> Co-authored-by: nurxat <738457498@qq.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: mikequan <199741451@qq.com> Co-authored-by: LI SHENGYONG <49200266+shenchuxiaofugui@users.noreply.github.com> Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Co-authored-by: pu-zhe <zpuaa@outlook.com> Co-authored-by: luomin2005 <luomin2005@huawei.com> Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com> Co-authored-by: Cao Yi <slightwindsec@gmail.com> Co-authored-by: Icey <1790571317@qq.com> Co-authored-by: SILONG ZENG <2609716663@qq.com>	2026-02-12 10:55:34 +08:00
Icey	88773bb101	[main to main] upgrade main 0210 (#6673 ) ### What this PR does / why we need it? upgrade vllm commit to `9562912cead1f11e8540fb91306c5cbda66f0007` ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? all tests passed - vLLM version: v0.15.0 - vLLM main: `13397841ab` --------- Signed-off-by: wxsIcey <1790571317@qq.com>	2026-02-11 18:10:14 +08:00
jiangyunfan1	1eb07986bf	[TEST]add a qwen3-30b acc case with mooncake mempool (#6244 ) ### What this PR does / why we need it? This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to test it regual ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.14.1 - vLLM main: `d68209402d` Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2026-02-10 16:26:55 +08:00
wangxiyuan	2a826b5fad	[Misc] upgrade to vllm main (#6646 ) ### What this PR does / why we need it? This PR upgrades the core vLLM dependency to a newer version from the main branch (`13397841ab469cecf1ed425c3f52a9ffc38139b5`). This is necessary to keep our project up-to-date with the latest features and fixes from upstream vLLM. 1. `ac32e66cf9` pass file is moved. - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wxsIcey <1790571317@qq.com>	2026-02-10 14:08:59 +08:00
Qiu	cb7c419bc0	[Feat](sfa,dcp) support dcp for sfa (#6563 ) ### What this PR does / why we need it? This PR adds DCP support to the SFA backend. Please note that due to operator constraints, the current implementation has to all-gather the entire KV cache and modify the block table to satisfy the operator input requirements. This results in significantly increased communication overhead and peak memory usage. Therefore, this is only a temporary workaround and will be refactored once the operator provides proper support. Additionally, because of the above limitations, `cp_kv_cache_interleave_size` is currently required to be equal to `block_size`. This restriction will also be removed after the refactor. #### Test accuracy test using DeepSeek-V3.2-Exp-W8A8 with dp2tp8dcp8 \| dataset \| version \| metric \| mode \| vllm-api-general-stream \| \|----- \| ----- \| ----- \| ----- \| -----\| \| gsm8kdataset \| - \| accuracy \| gen \| 96.35 \| - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2026-02-09 18:52:25 +08:00
wangxiyuan	d266fd7b47	[CI] Add workflow support for lint image build (#6489 ) Support specify commit hash with lint image build workflow - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-07 09:32:01 +08:00
pu-zhe	1cc225711d	[Refactor]310p_e2e test case update (#6539 ) ### What this PR does / why we need it? This pull request significantly enhances the test suite by adding new end-to-end test cases for Qwen3 models on the 310P hardware platform. The primary goal is to ensure the stability and correctness of these models under diverse operational conditions, including various parallelism strategies, data types, and quantization methods. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? E2E test - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: pu-zhe <zpuaa@outlook.com>	2026-02-07 09:28:37 +08:00
wangyu	c63b7a1188	[Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (#5301 ) ### What this PR does / why we need it? This PR adds disaggregated encoder tests for Qwen2.5-VL-7B-Instruct ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test by running ci - vLLM version: release/v0.12.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>	2026-02-06 17:30:17 +08:00
wangxiyuan	d0bc16859c	[CI][Misc] Some improvement for github action (#6587 ) ### What this PR does / why we need it? - This PR removes several self-hosted runner labels from the `actionlint.yaml` configuration file. These runners are likely no longer in use, so this change cleans up the configuration and ensures `actionlint` has an accurate list of available runners. - Move all Action dockerfiles to one folder - remove useless `runner` input for e2e test. - update workflow option version ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a configuration change for the CI linter. The correctness will be verified by `actionlint` running in CI on subsequent pull requests. - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-06 14:06:27 +08:00
Li Wang	d018aeb5fa	[Image] Bump mooncake version to v0.3.8.post1 (#6428 ) ### What this PR does / why we need it? This patch bump the mooncake version to the latest [release](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.8.post1) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? test is locally >>> from mooncake.engine import TransferEngine - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-02-06 10:54:03 +08:00
Nengjun Ma	11339eb48a	[CI] Update UT CANN version to 8.5.0 for main branch (#6564 ) ### What this PR does / why we need it? Update UT CANN version to 8.5.0 ### Does this PR introduce _any_ user-facing change? NA - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-02-06 10:28:42 +08:00
zhangxinyuehfad	81f3c09d6d	[CI] Change A2 runner (#6557 ) ### What this PR does / why we need it? This PR updates the CI runner from `linux-aarch64-a2-` to `linux-aarch64-a2b3-` in various test configuration files. This change is necessary to adapt to updates in the CI infrastructure. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The changes are configuration updates for CI tests. The correctness will be verified by the CI pipeline. Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-02-05 23:43:57 +08:00
meihanc	922e5c163b	[main2main] upgrade vllm main 0202 (#6560 ) ### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to https://github.com/vllm-project/vllm/pull/32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to https://github.com/vllm-project/vllm/pull/33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to https://github.com/vllm-project/vllm/pull/33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to https://github.com/vllm-project/vllm/pull/32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to https://github.com/vllm-project/vllm/pull/32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to https://github.com/vllm-project/vllm/pull/27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to https://github.com/vllm-project/vllm/pull/33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to https://github.com/vllm-project/vllm/pull/32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>	2026-02-05 19:31:17 +08:00
SILONG ZENG	b804eb12f6	[CI]Nightly test use `main` (#6502 ) ### What this PR does / why we need it? Nightly test use `main` - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: MrZ20 <2609716663@qq.com>	2026-02-03 15:40:59 +08:00
zhangyiming	41d48cb974	[CI] Update doctest from 0.9.1 to 0.13.0, and copy doc test workflow to nightly CI for better monitor. (#6452 ) ### What this PR does / why we need it? [CI] Update doctest from 0.9.1 to 0.13.0, and copy doc test workflow to nightly CI for better monitor. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: menogrey <1299267905@qq.com>	2026-02-03 15:19:03 +08:00
Feng Liu	03a18ad6fd	[E2E] add E2E for Prefix Caching cp & Chunked Prefill cp (#5149 ) ### What this PR does / why we need it? Add E2E for Prefix Caching cp & Chunked Prefill cp ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: F.Liu <liufeng248@huawei.com> Signed-off-by: Feng Liu <46866849+ader47@users.noreply.github.com> Co-authored-by: F.Liu <liufeng248@huawei.com>	2026-02-03 15:04:14 +08:00
Shaoxu Cheng	39e77fb9e4	[Feat.]: support 310p w8a8 (#6454 ) ### What this PR does / why we need it? Introduced 310P W8A8 Quantization Support: New modules and methods have been added to enable W8A8 static quantization specifically for the Ascend 310P platform. Platform-Specific Quantization Configuration Loading: The system now dynamically loads the appropriate quantization configurations (AscendCompressedTensorsConfig, AscendModelSlimConfig) based on whether the current hardware is an Ascend 310P device. Implemented AscendW8A8LinearMethod310P: A dedicated linear quantization method for 310P is provided, handling the specifics of weight and activation quantization, including input parameter broadcasting and weight data manipulation. Extended AscendModelSlimConfig for 310P: A specialized configuration class for 310P integrates the new W8A8 linear method for both standard linear layers and vocabulary parallel embeddings, ensuring proper quantization application. - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com> Signed-off-by: Shaoxu Cheng <2906339855@qq.com>	2026-02-03 14:13:06 +08:00
SILONG ZENG	f4a72f0d16	[CI]Disable early exit to complete all tests (#6482 ) ### What this PR does / why we need it? 1. Disable the feature to exit early upon encountering an error in order to complete all tests. 2. Within each partition, tests are re-sorted by `estimated_time` in ascending order. This allows the CI to cover as many test cases as possible in the early stages. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-02-03 11:25:51 +08:00
LHXuuu	45a573cff1	[Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889 ) ### What this PR does / why we need it? While using the LLM Compressor quantization tool from the VLLM community to generate quantized weights, the VLLM Ascend engine needs to be adapted to support the compressed tensors quantization format. 1. Support Moe model W4A8 dynamic weight. - vLLM version: v0.13.0 - vLLM main: `bde38c11df` --------- Signed-off-by: LHXuuu <scut_xlh@163.com> Signed-off-by: menogrey <1299267905@qq.com> Co-authored-by: menogrey <1299267905@qq.com>	2026-02-02 16:39:32 +08:00
wangxiyuan	eeedf7c503	[Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470 ) ### What this PR does / why we need it? This PR upgrades the vLLM dependency from `v0.14.1` to `v0.15.0`. This involves: - Updating the `VLLM_TAG` in all `Dockerfile`. - Updating the vLLM version in `docs/source/conf.py`. - Removing conditional code paths specific to `v0.14.1` across the codebase, which simplifies maintenance. - Fix `TypeError: MMEncoderAttention.__init__() got an unexpected keyword argument 'multimodal_config'` due to https://github.com/vllm-project/vllm/pull/31972. - Fix `_shared_experts: 'NoneType' object is not callable` due to https://github.com/vllm-project/vllm/pull/32082 by https://github.com/vllm-project/vllm-ascend/pull/6335. - Fix `ReshapeAndCacheOperation setup failed!` due to https://github.com/vllm-project/vllm/pull/25954 by overriding attention metadata slots. This upgrade is necessary to keep the project aligned with the latest features, bug fixes, and API changes in the vLLM project. ### Does this PR introduce _any_ user-facing change? No, this is an internal dependency update and does not introduce any user-facing changes. ### How was this patch tested? CI is expected to pass with these changes, ensuring that all existing tests are successful with the new vLLM version. - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` co-authored-by: shen-shanshan <467638484@qq.com> --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-02 15:57:55 +08:00
wangxiyuan	f7dc7d9b86	[CI] support build wheel and docker image by workflow (#6453 ) Make image and wheel build CI job work with workflow_dispatch way - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-01 20:06:22 +08:00
wangxiyuan	b4aafd4293	[Core][Misc] Clean up ProfileExecuteDuration (#6461 ) ### What this PR does / why we need it? This PR removes the custom `ProfileExecuteDuration` utility and its usages across the codebase. This utility was used for profiling execution duration of different stages in the inference process. It is replaced by the standard `vllm.v1.utils.record_function_or_nullcontext`, which integrates with PyTorch's profiler. This change simplifies the code by removing a custom implementation in favor of an upstream utility, improving maintainability. Associated documentation and tests for `ProfileExecuteDuration` are also removed. ### Does this PR introduce _any_ user-facing change? `VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE` env is removed now. ### How was this patch tested? CI passed. The changes are a cleanup and replacement with a standard utility. Existing tests cover the functionality. The removed feature had its own tests which are also removed. Related RFC: #5304 - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-01 20:06:01 +08:00
Li Wang	8969b94a14	[Nightly] Correct nightly image build ref (#6420 ) ### What this PR does / why we need it? The underlying tags for nightly image builds have been corrected, and some useless and confusing workflow fields have been removed. - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-30 15:55:58 +08:00

1 2 3 4 5 ...

567 Commits