xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	99e1ea0fe6	[v0.18.0][Misc] Upgrade torch_npu to pre-release built version (#7918 ) ### What this PR does / why we need it? This PR upgrades the `torch_npu` (PTA) version in multiple Dockerfiles to a pre-release build. It introduces logic to dynamically select the correct wheel based on the Python version and system architecture. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with existing tests. The author should verify that the Docker images build successfully for all supported architectures and Python versions. --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-04-01 22:41:09 +08:00
weiguihua2	59a7526339	[CI][Misc] modify ds3.2+dcp ci (#7841 ) ### What this PR does / why we need it? Due to the current dcp solution of allgathering the KV cache, the performance deteriorates significantly, and the CI may get stuck. This PR temporarily removes the performance and accuracy benchmarks for DeepSeek-V3.2-W8A8-cp to prevent CI hangs until optimization is complete. pcik-from:https://github.com/vllm-project/vllm-ascend/pull/7842 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified that the configuration file remains valid and that the CI no longer attempts to run the problematic benchmarks. pick-from: https://github.com/vllm-project/vllm-ascend/pull/7842 --------- Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2026-04-01 08:58:21 +08:00
zhangxinyuehfad	af4278be35	[v0.18.0][CI] Close build image by pr (#7776 ) ### What this PR does / why we need it? Close build image by pr This PR is related to https://github.com/vllm-project/vllm-ascend/pull/7775, please merge them together Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-31 16:38:43 +08:00
zhangxinyuehfad	c1cefd26de	[v0.18.0][CI] Add nightly- prefix to branch/PR image tags (#7765 ) ### What this PR does / why we need it? Add nightly- prefix to branch/PR image tags Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-28 11:31:16 +08:00
zhangxinyuehfad	2c175f5ed8	[v0.18.0][Bugfix] Fix pr triggers on branches for nightly test workflows (#7695 ) ### What this PR does / why we need it? 1. Allow PR triggers on `-dev` and `releases/v` branches for nightly test workflows. 2. fix image-tag in doc --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-27 15:17:06 +08:00
zhangxinyuehfad	d781902ce9	[v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686 ) ### What this PR does / why we need it? Fix releases/v0.18.0 ci test only support vllm v0.18.0 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-26 18:36:04 +08:00
zhangxinyuehfad	124bb00158	[CI][v0.18.0] Build nightly image for releases/v0.18.0 per pr (#7662 ) ### What this PR does / why we need it? This patch add per pr image build for branch `releases/v0.18.0`, Due to the limitations of the quay naming convention, we should not name the image tag the same as branch name, we name the image tag`releases-v0.18.0` for daily build. Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-26 16:48:51 +08:00
meihanc	114ec75a06	[bugfix][CI] fix '_OpNamespace' 'vllm' object has no attribute 'qkv_rmsnorm_rope' (#7620 ) ### What this PR does / why we need it? fix '_OpNamespace' 'vllm' object has no attribute 'qkv_rmsnorm_rope' by uinstall triton - vLLM version: v0.18.0 - vLLM main: `ed359c497a` Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-03-25 11:05:34 +08:00
Li Wang	8e3f8bab57	[Nightly] Nightly pre-build image (#7388 ) ### What this PR does / why we need it? This pull request refactor nightly image build and simplify the logic of multi workflows. 1. Nightly image build become the prerequisite when the test are triggered by `schedule` or `workflow_dispatch` 2. Simplify the pull request select case logic 3. Next step: Implement replaceable nightly tests. Specifically, if nightly tests are manually triggered, they can accept any optional docker image to meet the needs of different commits(Which means the image is customizable). ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-25 09:24:01 +08:00
drizzlezyk	54879467c4	[CI] refine issue triage rules, wan regex and update stale setting (#7531 ) - Update issue labeler regex for wan to match numeric suffix only, including both standalone wan label and multi-modality-generate aggregate rule. - Add title-based gate conditions in issue triage workflow so auto-labeling runs only for expected issue templates ( [Bug]: , [Installation]: , [Usage]: , [Doc]: ). - Adjust scheduled stale workflow configuration for the awaiting-feedback processing block. ### What this PR does / why we need it? - Update issue labeler regex for wan to match numeric suffixes only, in both: - standalone wan label rule - multi-modality-generate aggregate rule - Add title-based gate conditions in issue triage workflow so auto-labeling runs only for expected templates: [Bug]:/ [Installation]:/ [Usage]:/ [Doc]: - Adjust the scheduled stale workflow configuration for the awaiting-feedback processing block. ### Does this PR introduce _any_ user-facing change? - No runtime/API user-facing change. - This PR only updates repository automation behavior in GitHub workflows and issue labeling rules. ### How was this patch tested? - Performed config-level validation by reviewing diffs and final YAML content for: - .github/issue-labeler.yml - .github/workflows/bot_issue_manage.yaml - .github/workflows/schedule_stale_manage.yaml - Verified wan regex now requires numeric suffix (e.g., wan2 , wan2.1 ) and no longer matches alphabetic suffix forms (e.g., wana ). - Verified triage workflow includes title-based if conditions for expected issue templates. - Verified stale workflow’s awaiting-feedback block reflects the intended configuration adjustment. - No unit/e2e tests were added because this PR changes GitHub Actions and labeling configuration only. - vLLM version: v0.18.0 - vLLM main: `8b6325758c` --------- Signed-off-by: drizzlezyk <drizzlezyk@163.com>	2026-03-24 20:11:31 +08:00
SILONG ZENG	1e3c1e76bf	[Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511 ) ### What this PR does / why we need it? This PR introduces several upstream `vllm`-aligned lint hooks into `vllm-ascend` and makes them part of the actual `pre-commit` flow. Main changes in this PR: - add `check-boolean-context-manager` to catch boolean expressions in `with` statements - add `check-forbidden-imports` to forbid direct `re` imports and disallowed direct `triton` imports - enable shell script linting through `tools/shellcheck.sh` - add root `.clang-format` aligned with upstream `vllm`, enable `clang-format` in `pre-commit`, temporarily exclude all `csrc/` from `clang-format` to avoid bringing a large native code reformat into this PR This PR focuses on landing the smaller and immediately useful lint alignment first, without mixing in the larger requirements-management migration. ### Does this PR introduce _any_ user-facing change? No. This PR only updates repository lint configuration, static checks, and internal import/style enforcement. It does not change runtime behavior or public interfaces. ### How was this patch tested? Tested locally in the project virtual environment. Commands used: ```bash bash format.sh ``` Verified checks passed: ``` bash ruff check...............................................................Passed ruff format..............................................................Passed codespell................................................................Passed typos....................................................................Passed clang-format.............................................................Passed Lint GitHub Actions workflow files.......................................Passed Lint shell scripts.......................................................Passed Lint PNG exports from excalidraw.........................................Passed Check for spaces in all filenames........................................Passed Enforce __init__.py in Python packages...................................Passed Check for forbidden imports..............................................Passed Check for boolean ops in with-statements.................................Passed Suggestion...............................................................Passed - hook id: suggestion - duration: 0s To bypass pre-commit hooks, add --no-verify to git commit. ``` note: clang-format is enabled but currently excludes all csrc/ - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-03-24 20:03:01 +08:00
realliujiaxu	5d12446573	[Feat][SP] Suport SP for VL MoE models (#7044 ) ### What this PR does / why we need it? 2nd PR for https://github.com/vllm-project/vllm-ascend/issues/5712, extend SP to VL MoE models. ### Does this PR introduce _any_ user-facing change? remove `sp_threshold` in additional config and reuse `sp_min_token_num` from vLLM. ### How was this patch tested? - Model: Qwen3-VL-30B-A3B, - TP4 DP2 - 100 reqs - max concurrency 1 \| Seq length \| Mean TTFT (ms) main \| Mean TTFT (ms) this PR \| \|------------\|---------------------\|------------------------\| \| 4k \| 429.40 \| 323.3 \| \| 16k \| 1297.01 \| 911.74 \| - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2026-03-24 17:16:00 +08:00
LeeWenquan	9615bc33fd	Fix Qwen3Next CI Config (#7561 ) ### What this PR does / why we need it? This pr modifies qwen3Next nightly CI config. (1) Add a nightly CI . (2) Set a more precise accuracy standard - vLLM version: v0.18.0 - vLLM main: `6a9cceb219` Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com>	2026-03-24 17:08:17 +08:00
Nengjun Ma	fcba91a392	Main2main Upgrade vllm commit to 0320 17:00 (#7510 ) ### What this PR does / why we need it? Main2main Upgrade vllm commit to 0320 17:00 1. fix vllm refactored `_moe_forward` to call `runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification [#33049](https://github.com/vllm-project/vllm/pull/33049)" 2.fix vllm moved the call to `self._set_compile_ranges()` in `VllmConfig.__post_init__` from before `check_and_update_config()` to after it (to allow platforms to lower `max_num_batched_tokens` first). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" [#37523](https://github.com/vllm-project/vllm/pull/37523) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>	2026-03-23 21:37:41 +08:00
meihanc	e344a53127	[bugfix][CI]Skip e2e log summary when the log file is missing or empty (#7552 ) ### What this PR does / why we need it? Avoid failing `ci_log_summary.py` when the e2e log file is missing or empty. Test in CI :https://github.com/vllm-project/vllm-ascend/actions/runs/23428406256/job/68149271871 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.18.0 - vLLM main: `8b6325758c` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>	2026-03-23 20:25:59 +08:00
zhangxinyuehfad	886756aea0	[Bugfix][CI] Fix aisbench installation to avoid Gitee authentication (#7536 ) ### What this PR does / why we need it? - Pass GITEE_USERNAME (var) and GITEE_TOKEN (secret) as Docker build args in nightly image build so Dockerfile can authenticate to Gitee - In Dockerfile.nightly.a2/a3, embed credentials into clone URL to avoid auth failure during `git clone` - In single-node and multi-node PR test workflows, backup the pre-installed benchmark from the nightly image before wiping vllm-ascend, then restore it instead of re-cloning from Gitee, which is inaccessible from fork PR contexts ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.18.0 - vLLM main: `8b6325758c` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-23 20:16:51 +08:00
liuhy1213-cell	fb283b5820	[CI] Add nightly CI test cases for the GLM-5 (#7429 ) ### What this PR does / why we need it? Add nightly CI test cases for the GLM-5 Add model download for the GLM-5 https://github.com/vllm-project/vllm-ascend/actions/runs/23286178651/job/67710409642#logs - vLLM version: v0.17.0 - vLLM main: `b31e9326a7` --------- Signed-off-by: liuhaiyang27 <liuhaiyang27@huawei.com> Signed-off-by: liuhy1213-cell <liuhy1213@gmail.com> Co-authored-by: liuhaiyang27 <liuhaiyang27@huawei.com>	2026-03-23 19:14:19 +08:00
Nengjun Ma	8e2c59e1ee	Main2main upgrade vllm commit to 03 19 17:00 (#7478 ) ### What this PR does / why we need it? Upgrade vllm commit to 2026.03.19. 1.Fix socket removed from StatelessProcessGroup. Upstream vLLM PR [#36330](https://github.com/vllm-project/vllm/pull/36330) ("elastic_ep: Fix stateless group port races") refactored StatelessProcessGroup and removed the socket: socket.socket \| None field. The socket ownership was moved to a new create_tcp_store() helper instead of being stored as a field on the dataclass. 2.fix `virtual_engine` parameter removed from `set_forward_context(). Upstream [V0 Deprecation] Deprecate virtual engine [#37195](https://github.com/vllm-project/vllm/pull/37195) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-03-23 16:25:57 +08:00
dependabot[bot]	da866cc168	[CI] Bump docker/build-push-action from 6 to 7 (#7541 ) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - vLLM version: v0.18.0 - vLLM main: `8b6325758c` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-23 15:46:12 +08:00
Qiu	71df17f4e6	bugfix(MC2): refactor the comm group of MC2 to be compatible with PP (#7291 ) ### What this PR does / why we need it? This PR refactors the communication group of MC2 to keep it consistent with vllm's EP group, making it compatible with PP. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2026-03-23 15:44:21 +08:00
dependabot[bot]	8527b49764	[CI] Bump docker/setup-buildx-action from 3 to 4 (#7542 ) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. - vLLM version: v0.18.0 - vLLM main: `8b6325758c` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-23 15:44:14 +08:00
Shanshan Shen	5c0d02f689	[Bugfix] Fix multi-instance serving OOM on single card (#7427 ) ### What this PR does / why we need it? Fix https://github.com/vllm-project/vllm-ascend/issues/7308. Subtracting `init_non_torch_memory` (maybe used by the first instance) from the total `non_torch_memory` when calculating `available_kv_cache_memory`. Directly use `non_torch_memory_increase` (contained in `non_kv_cache_memory`) to calculate `available_kv_cache_memory`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Launch tow vllm-ascend instances sequentially on single card. ```bash # Launch first instance vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B \ --port 8100 \ --host 0.0.0.0 \ --additional-config='{"enable_cpu_binding":true}' \ --gpu-memory-utilization 0.3 \ --max-num-seqs 1 \ --max-model-len 2048 \ --max-num-batched-tokens 2048 \ --no-enable-prefix-caching \ --enforce-eager # Launch second instance vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B \ --port 8101 \ --host 0.0.0.0 \ --additional-config='{"enable_cpu_binding":true}' \ --gpu-memory-utilization 0.3 \ --max-num-seqs 1 \ --max-model-len 2048 \ --max-num-batched-tokens 2048 \ --no-enable-prefix-caching \ --enforce-eager ``` Before this PR: ```bash # First instance: ------------------------------------------------------------------ requested_memory: 18.287109375 GiB non_kv_cache_memory: 1.2340388298034668 GiB init_non_torch_memory: 0.3616676330566406 GiB non_torch_memory_before_empty_cache: 0.3896217346191406 GiB non_torch_memory_increase: 0.0279541015625 GiB non_torch_memory_cleared_by_empty_cache: 0.3616676330566406 GiB ------------------------------------------------------------------ # Second instance: ------------------------------------------------------------------ requested_memory: 18.287109375 GiB non_kv_cache_memory: 1.2336344718933105 GiB init_non_torch_memory: 18.37220001220703 GiB non_torch_memory_before_empty_cache: 18.399906158447266 GiB non_torch_memory_increase: 0.02754974365234375 GiB non_torch_memory_cleared_by_empty_cache: 18.372356414794922 GiB ------------------------------------------------------------------ # available_kv_cache_memory = requested_memory - non_kv_cache_memory - non_torch_memory_cleared_by_empty_cache Available KV cache memory: -1.32 GiB ``` After this PR: ```bash # First instance: ------------------------------------------------------------------ requested_memory: 18.287109375 GiB non_kv_cache_memory: 1.2340540885925293 GiB init_non_torch_memory: 0.36182403564453125 GiB non_torch_memory_before_empty_cache: 0.38979339599609375 GiB non_torch_memory_increase: 0.0279693603515625 GiB non_torch_memory_cleared_by_empty_cache: 0.0 GiB ------------------------------------------------------------------ # Second instance: ------------------------------------------------------------------ requested_memory: 18.287109375 GiB non_kv_cache_memory: 1.233344554901123 GiB init_non_torch_memory: 18.74309539794922 GiB non_torch_memory_before_empty_cache: 18.770355224609375 GiB non_torch_memory_increase: 0.02725982666015625 GiB non_torch_memory_cleared_by_empty_cache: 0.0 GiB ------------------------------------------------------------------ # available_kv_cache_memory = requested_memory - non_kv_cache_memory - non_torch_memory_cleared_by_empty_cache Available KV cache memory: 17.05 GiB ``` - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2026-03-23 14:22:59 +08:00
Li Wang	75fae619d5	[Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455 ) ### What this PR does / why we need it? Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: `8a680463fa` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-23 09:08:21 +08:00
meihanc	bff4fbfca5	upgrade to 0.18.0 (#7502 ) ### What this PR does / why we need it? 1. upgrade to 0.18.0 2. ensure kernel_block_sizes is int for Eagle drafter ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>	2026-03-21 16:05:38 +08:00
Li Wang	6ad74e8c80	[CI] Add git safe repo (#7501 ) ### What this PR does / why we need it? Add git safe repo to avoid dubious ownership error - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-20 16:40:24 +08:00
vllm-ascend-ci	95e1dc11d8	[CI]: Auto-update estimated test times in config.yaml (#7413 ) ## Summary This PR was auto-generated by the Update estimated test times [workflow](https://github.com/vllm-project/vllm-ascend/actions/runs/23226502411). It updates the `estimated_time` values in `.github/workflows/scripts/config.yaml` based on actual elapsed times collected from CI workflow runs. ### Methodology - Each e2e test job uploads its elapsed time as a `timing-data-` artifact upon completion. - The workflow aggregates all collected timing artifacts across jobs. - For each test, the median* elapsed time is computed to reduce outlier impact. - A 10% safety buffer is applied and the result is rounded to the nearest 10 seconds. ### Review Checklist - [ ] Verify that updated `estimated_time` values are within a reasonable range. - [ ] Confirm no test entries are missing or unexpectedly removed. > If the new values look reasonable, feel free to merge. Otherwise, leave a comment describing the anomaly. - vLLM version: v0.17.0 - vLLM main: `4497431df6` Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-03-19 19:01:16 +08:00
Nengjun Ma	ee804ce23e	Main2main upgrade vllm to 0318 commit (#7412 ) ### What this PR does / why we need it? Upgrade vllm commit to 0318. Main content: Added a pre-operation for cleaning up and waiting(default max 50s) for the completion of the clean up of the NPU memory to some test cases that failed due to the failure to release the NPU memory in a timely manner when the previous test cases were executed. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-03-19 17:17:36 +08:00
aipaes	87d6424b2e	[CI] Add nightly CI test cases for the GLM-4.7 model. (#7391 ) ### What this PR does / why we need it? Add acc nightly CI test cases for the GLM-4.7 model. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? through CI - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: zjks98 <zhangjiakang4@huawei.com> Co-authored-by: zjks98 <zhangjiakang4@huawei.com>	2026-03-19 16:43:29 +08:00
aipaes	0261d1b1c6	[CI] add glm4.7 weights download (#7395 ) ### What this PR does / why we need it? Download GLM4.7 w8a8 weights for CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? through CI - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: zjks98 <zhangjiakang4@huawei.com> Signed-off-by: aipaes <82140963+aipaes@users.noreply.github.com> Co-authored-by: zjks98 <zhangjiakang4@huawei.com>	2026-03-19 16:43:15 +08:00
zhangxinyuehfad	ce239db4fb	[CI] Add multi-hardware wheel build and release workflow (#7312 ) ### What this PR does / why we need it? Adds a scheduled CI workflow (schedule_release_code_and_wheel.yml) to automatically build and release vllm-ascend source packages and binary wheels for multiple Ascend hardware targets. Key features: 1. Source release: Builds tar.gz sdist and uploads to PyPI on version tag push 2. Multi-hardware wheel builds: Supports three hardware targets in parallel: 2.1 A2 (Ascend 910B): x86_64 + ARM64, Python 3.10 / 3.11 2.2 A3 (Ascend 910C): x86_64 + ARM64, Python 3.10 / 3.11 2.3 310P: x86_64 + ARM64, Python 3.10 / 3.11 3. Wheel repair: Uses auditwheel to produce manylinux-compatible wheels, excluding Ascend NPU runtime libs (libascend.so, libtorch.so, etc.) that must be provided by the runtime environment 4. Variant wheels: Generates hardware-variant wheels via variantlib for hardware-specific distribution 5. OBS upload: Aggregates all variant wheels and a combined index JSON, then uploads to Huawei OBS for hosting ### Does this PR introduce _any_ user-facing change? Yes. Users will be able to install hardware-specific vllm-ascend wheels from PyPI or the OBS variant index, eliminating the need to build from source. ### How was this patch tested? 1. CI verification only — workflow syntax and job dependency logic reviewed manually 2. Wheel build steps validated against existing Dockerfiles (Dockerfile.buildwheel.a2/a3/310p) 3. auditwheel exclusion list verified against known Ascend runtime shared libraries - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: YanZhicong <mryanzhicong@163.com> Co-authored-by: YanZhicong <mryanzhicong@163.com>	2026-03-19 11:06:17 +08:00
LoganJane	270c5cb8cd	[CI] Add nightly CI test cases for the Kimi-K2.5 (#7416 ) ### What this PR does / why we need it? Add nightly CI test cases for the Kimi-K2.5. - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: LoganJane <loganJane73@hotmail.com> Signed-off-by: LoganJane <42287016+LoganJane@users.noreply.github.com>	2026-03-19 11:02:29 +08:00
meihanc	ab9cd2e305	[CI]Add CI summary log (#7202 ) ### What this PR does / why we need it? This PR adds a new CI log summarizer, `ci_log_summary.py`, and wires it into unit-test and e2e workflows so failed jobs publish a structured failure summary to the GitHub step summary. Examples: - `python3 .github/workflows/scripts/ci_log_summary.py --log-file /tmp/unit-test.log --mode ut --step-name "Unit test"` - `python3 .github/workflows/scripts/ci_log_summary.py --run-id 23127187822 --format json` A maintenance note is added to `ci_utils.py` to clarify that the `START` / `PASSED` / `FAILED (exit code X)` log lines are parsed by `ci_log_summary.py`, so any future format changes must be coordinated with the corresponding summarizer regexes. 🤖 Generated with [Codex]<noreply@openai.com> - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: meihanc <jcccx.cmh@gmail.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 09:32:06 +08:00
Nengjun Ma	8b79d4de52	Main2main upgrade to vllm 0317 afternoon (#7409 ) ### What this PR does / why we need it? 1.fix "TypeError: get_attn_backend() remove variable": [Refactor `check_and_update_config`](https://github.com/vllm-project/vllm/pull/35122) 2.fix [Rename `compile_ranges_split_points` to `compile_ranges_endpoints`](https://github.com/vllm-project/vllm/pull/36027) 3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace memory related torch.cuda APIs"](https://github.com/vllm-project/vllm/pull/37031) 4.fix [Support multiple KV groups in OffloadingSpec ](https://github.com/vllm-project/vllm/pull/36610) removed self.offloaded_block_size and changed self.gpu_block_size from a scalar to a tuple of per-group block sizes, adding block_size_factor. 5.fix [Consolidate SupportsEagle](https://github.com/vllm-project/vllm/pull/36063) renamed get_eagle3_aux_hidden_state_layers() to get_eagle3_default_aux_hidden_state_layers() and added a supports_eagle3() guard before calling it. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E - vLLM version: v0.17.0 - vLLM main: `8a680463fa` --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>	2026-03-18 23:24:27 +08:00
jiangmengyu18	305820f1a9	[Bugfix] fix bug about model type of qwen3_vl_8b_instruct_w8a8 (#7383 ) ### What this PR does / why we need it? Adapt to the model type of Qwen3-VL-8B-Instruct-W8A8 - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: betta18 <jiangmengyu1@huawei.com> Co-authored-by: betta18 <jiangmengyu1@huawei.com>	2026-03-18 20:30:03 +08:00
SparrowMu	fb8e22ec00	[DOC] MiniMax-M2.5 model intro (#7296 ) ### What this PR does / why we need it? 1. Add nightly test on MiniMax-M2.5 with deployment method on A3 2. Add MiniMax-M2.5 deployment introduction to vllm-ascend docs - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: limuyuan <limuyuan3@huawei.com> Signed-off-by: SparrowMu <52023119+SparrowMu@users.noreply.github.com> Co-authored-by: limuyuan <limuyuan3@huawei.com>	2026-03-18 20:14:36 +08:00
LoganJane	2916601e6c	[CI] add Kimi-K2.5 weights download (#7406 ) ### What this PR does / why we need it? Add Kimi-K2.5 weights download. - vLLM version: v0.17.0 - vLLM main: `4497431df6` Signed-off-by: LoganJane <loganJane73@hotmail.com>	2026-03-18 18:29:37 +08:00
dependabot[bot]	1ff9e3f25f	[CI] Bump docker/login-action from 3 to 4 (#7299 ) Bumps [docker/login-action](https://github.com/docker/login-action) from3 to 4. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-18 17:06:48 +08:00
dependabot[bot]	b3206cd6f6	[CI] Bump actions/setup-python from 5 to 6 (#7298 ) Bumps [actions/setup-python](https://github.com/actions/setup-python)from 5 to 6. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-18 17:06:28 +08:00
Li Wang	5894a27bfd	[CI] Add PAT_TOKEN when checkout (#7400 ) ### What this PR does / why we need it? When we checkout the fork repo and wanna to submit push to the fork repo, the pat_token is needed - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-18 10:31:32 +08:00
zhangyiming	1c954ff264	[main2main] upgrade vllm to 0308 (#7213 ) ### What this PR does / why we need it? Update main2main to vllm 0308. breaks: * https://github.com/vllm-project/vllm/pull/30681 * https://github.com/vllm-project/vllm/pull/35552 remove self.cudagraph_batch_sizes * https://github.com/vllm-project/vllm/pull/35158 clear_metadata -> defer_finalize * https://github.com/vllm-project/vllm/pull/36006 remove CacheConfig.cpu_offload_gb * https://github.com/vllm-project/vllm/pull/35472 * https://github.com/vllm-project/vllm/pull/34552 attn_metadata_builder * https://github.com/vllm-project/vllm/pull/30515 profile_seq_lens * https://github.com/vllm-project/vllm/pull/28053 - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: menogrey <1299267905@qq.com> Co-authored-by: MrZ20 <2609716663@qq.com>	2026-03-18 09:24:43 +08:00
drizzlezyk	79ef41a53d	[CI] add scheduled stale issue management (#7354 ) ### What this PR does / why we need it? 1. issue with "resolved", 7 days stale, 14 days closed after stale with `stale` and `resolved` label. 2. issue with "awaiting-feedback", 7 days stale, 14 days closed after stale with `stale` and `awaiting-feedback` label. Change items: - Add a scheduled stale-management workflow to process resolved and awaiting-feedback issues independently. - Automatically mark inactive issues as stale , post tailored reminder messages, and close issues after a grace period. - Remove source labels when issues become active again, and disable PR stale handling so the automation remains issue-scoped. ### Does this PR introduce _any_ user-facing change? - No API or runtime behavior changes. - This PR only updates GitHub issue automation (labeling and stale management workflow). ### How was this patch tested? - Test locally - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: drizzlezyk <drizzlezyk@163.com>	2026-03-17 23:28:29 +08:00
drizzlezyk	467c815db6	[CI] expand issue labeler rules for feature/model triage (#7356 ) - Replace minimal label rules with a comprehensive keyword-based issue labeler taxonomy. - Add grouped labels for core features and advanced capabilities to improve issue routing. - Expand model-related matching for LLM, multimodal generation, multimodal understanding, audio, and omni scenarios. - Add/normalize regex patterns for common model families (DeepSeek, Kimi, GLM, Qwen, 310p, etc.) to increase auto-label coverage and consistency. ### What this PR does / why we need it? - Expands `.github/issue-labeler.yml` from a minimal set of rules to a richer keyword-based labeling configuration. - Adds grouped label dimensions for: - Core features (e.g., PD disaggregation, KV cache pool, ACLGraph, async scheduler, CPU binding, quantization) - Advanced features (e.g., long sequence, DPC/PCP, MTP/speculative decode) - Model categories (LLM, multimodal generation, multimodal understanding, audio, omni, etc.) - Specific model families (e.g., DeepSeek, Kimi, GLM, Qwen, 310p) - Improves automatic issue triage accuracy and reduces manual label maintenance effort. - Makes issue categorization more consistent for maintainers and contributors. Why needed: - Existing labeler rules were too limited and could not adequately cover current feature/model issue distribution. - Broader and more structured matching helps faster routing, prioritization, and ownership assignment. Fixes #N/A ### Does this PR introduce _any_ user-facing change? - No runtime/API user-facing changes. - This PR only updates GitHub issue automation rules. ### How was this patch tested? - Performed static validation and review of `.github/issue-labeler.yml` structure and regex entries. - Verified that rule groups and label keys are correctly formatted for GitHub issue labeler consumption. - Confirmed that legacy minimal rules were replaced by expanded taxonomy without syntax-breaking YAML changes. - No unit/e2e tests were added because this is repository automation configuration (GitHub labeling rules) rather than application runtime logic. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: drizzlezyk <drizzlezyk@163.com>	2026-03-17 23:28:04 +08:00
rjg-lyh	4d443b9228	[bugfix] restore pr-7029 and fix patch error (#7294 ) ### What this PR does / why we need it? This PR restores #7029, which adds W8A8C8 support for dsv3.2/glm5 using the `lightning_indexer_quant` ops in the pd-mix stage. The original PR was reverted by #7288 because the patch did not work with the recompute scheduler. This PR also fixes the patching issue so that it works correctly with the recompute scheduler. ### Does this PR introduce _any_ user-facing change? Yes. To enable LI C8, users need to set the `enable_sparse_c8` option to `"true"` in `additional_config`. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: rjg-lyh <1318825571@qq.com>	2026-03-16 15:39:42 +08:00
zhaomingyu13	9320365dab	[Test][Feature] Add e2e test for QuaRot model with eagle3 (#7128 ) ### What this PR does / why we need it? Add an e2e test for QuaRot model with eagle3 that runs both the QuaRot model and the float model, and then compares their acceptance rates. The QuaRot model adapting eagle3 PR(#6914, #7038) - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>	2026-03-16 15:35:55 +08:00
Mengqing Cao	e20f0b1a0d	[ReleaseNote] Add release note for v0.17.0rc1 (#7240 ) ### What this PR does / why we need it? This pull request adds the release notes for `v0.17.0rc1`. It also updates version numbers across various documentation files, including `README.md`, `README.zh.md`, `docs/source/community/versioning_policy.md`, and `docs/source/conf.py` to reflect the new release. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e`	2026-03-15 22:47:47 +08:00
pppeng	7e85f2ff97	[CI] Add test_qwen3_5.py (#7133 ) ### What this PR does / why we need it? Add test_qwen3_5.py for base scenarios tp4 on Qwen3.5-27B and Qwen3.5-35B-A3B. - vLLM version: main - vLLM main: `4034c3d32e` --------- Signed-off-by: pppeng <zepengliu912@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2026-03-15 22:19:02 +08:00
Mengqing Cao	0c299f79b9	Revert "[Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029 )" (#7288 ) ### What this PR does / why we need it? This reverts commit `7ed9e9de69`, which introduces an issue that the patch doesn't work with recompute scheduler enabled. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2026-03-15 20:19:09 +08:00
yupeng	29f195a91c	[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 ) ### What this PR does / why we need it? Fix the error that reports while initializing qwen3-reranker-0.6b model with `--enable-lora`. And add a testcase to verify the fix. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2026-03-15 17:55:42 +08:00
Mengqing Cao	986cd45397	[Version] Drop 0.16.0 support (#7153 ) ### What this PR does / why we need it? Drop 0.16.0 support in main - Fix eagle proposer break introduced by https://github.com/vllm-project/vllm/pull/34552. Mainly change to use the draft attention group to initialize the attention metadata builder. - Fix the `ModelRunner` has no attribute `cudagraph_capture_sizes` error, which is a bug in vLLM v0.17.0, and fixed by a later pr https://github.com/vllm-project/vllm/pull/30515 - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2026-03-13 16:14:15 +08:00
rjg-lyh	7ed9e9de69	[Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029 ) ### What this PR does / why we need it? This PR supports W8A8C8 in dsv3.2/glm5 with lightning_indexer_quant ops in pd-mix stage mainly. Because the code for the current PD-disaggregated scenario is still under refactoring and cleanup, this PR prioritizes ensuring the C8 functionality in the pd-mix scenario. The next steps are planned in two parts: ① Once the optimized scatter operator is updated, we will replace the original operator to improve the performance of storing k_scale. ② Once the code logic for the PD-disaggregated scenario becomes stable, we will carry out more comprehensive validation and make appropriate adaptations. ③ Because enabling C8 currently introduces several new operators whose performance still needs improvement, performance may regress in some scenarios. Therefore, only after all the operators are fully ready can we ensure that this feature does not cause any performance degradation. At that point, we will enable this feature by default and remove the switch in `additional_config`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: rjg-lyh <1318825571@qq.com>	2026-03-13 14:47:42 +08:00

1 2 3 4 5 ...

628 Commits