xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	75fae619d5	[Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455 ) ### What this PR does / why we need it? Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: `8a680463fa` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-23 09:08:21 +08:00
meihanc	bff4fbfca5	upgrade to 0.18.0 (#7502 ) ### What this PR does / why we need it? 1. upgrade to 0.18.0 2. ensure kernel_block_sizes is int for Eagle drafter ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>	2026-03-21 16:05:38 +08:00
Li Wang	6ad74e8c80	[CI] Add git safe repo (#7501 ) ### What this PR does / why we need it? Add git safe repo to avoid dubious ownership error - vLLM version: v0.17.0 - vLLM main: `8b6325758c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-20 16:40:24 +08:00
vllm-ascend-ci	95e1dc11d8	[CI]: Auto-update estimated test times in config.yaml (#7413 ) ## Summary This PR was auto-generated by the Update estimated test times [workflow](https://github.com/vllm-project/vllm-ascend/actions/runs/23226502411). It updates the `estimated_time` values in `.github/workflows/scripts/config.yaml` based on actual elapsed times collected from CI workflow runs. ### Methodology - Each e2e test job uploads its elapsed time as a `timing-data-` artifact upon completion. - The workflow aggregates all collected timing artifacts across jobs. - For each test, the median* elapsed time is computed to reduce outlier impact. - A 10% safety buffer is applied and the result is rounded to the nearest 10 seconds. ### Review Checklist - [ ] Verify that updated `estimated_time` values are within a reasonable range. - [ ] Confirm no test entries are missing or unexpectedly removed. > If the new values look reasonable, feel free to merge. Otherwise, leave a comment describing the anomaly. - vLLM version: v0.17.0 - vLLM main: `4497431df6` Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-03-19 19:01:16 +08:00
Nengjun Ma	ee804ce23e	Main2main upgrade vllm to 0318 commit (#7412 ) ### What this PR does / why we need it? Upgrade vllm commit to 0318. Main content: Added a pre-operation for cleaning up and waiting(default max 50s) for the completion of the clean up of the NPU memory to some test cases that failed due to the failure to release the NPU memory in a timely manner when the previous test cases were executed. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2026-03-19 17:17:36 +08:00
aipaes	87d6424b2e	[CI] Add nightly CI test cases for the GLM-4.7 model. (#7391 ) ### What this PR does / why we need it? Add acc nightly CI test cases for the GLM-4.7 model. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? through CI - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: zjks98 <zhangjiakang4@huawei.com> Co-authored-by: zjks98 <zhangjiakang4@huawei.com>	2026-03-19 16:43:29 +08:00
aipaes	0261d1b1c6	[CI] add glm4.7 weights download (#7395 ) ### What this PR does / why we need it? Download GLM4.7 w8a8 weights for CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? through CI - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: zjks98 <zhangjiakang4@huawei.com> Signed-off-by: aipaes <82140963+aipaes@users.noreply.github.com> Co-authored-by: zjks98 <zhangjiakang4@huawei.com>	2026-03-19 16:43:15 +08:00
zhangxinyuehfad	ce239db4fb	[CI] Add multi-hardware wheel build and release workflow (#7312 ) ### What this PR does / why we need it? Adds a scheduled CI workflow (schedule_release_code_and_wheel.yml) to automatically build and release vllm-ascend source packages and binary wheels for multiple Ascend hardware targets. Key features: 1. Source release: Builds tar.gz sdist and uploads to PyPI on version tag push 2. Multi-hardware wheel builds: Supports three hardware targets in parallel: 2.1 A2 (Ascend 910B): x86_64 + ARM64, Python 3.10 / 3.11 2.2 A3 (Ascend 910C): x86_64 + ARM64, Python 3.10 / 3.11 2.3 310P: x86_64 + ARM64, Python 3.10 / 3.11 3. Wheel repair: Uses auditwheel to produce manylinux-compatible wheels, excluding Ascend NPU runtime libs (libascend.so, libtorch.so, etc.) that must be provided by the runtime environment 4. Variant wheels: Generates hardware-variant wheels via variantlib for hardware-specific distribution 5. OBS upload: Aggregates all variant wheels and a combined index JSON, then uploads to Huawei OBS for hosting ### Does this PR introduce _any_ user-facing change? Yes. Users will be able to install hardware-specific vllm-ascend wheels from PyPI or the OBS variant index, eliminating the need to build from source. ### How was this patch tested? 1. CI verification only — workflow syntax and job dependency logic reviewed manually 2. Wheel build steps validated against existing Dockerfiles (Dockerfile.buildwheel.a2/a3/310p) 3. auditwheel exclusion list verified against known Ascend runtime shared libraries - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: YanZhicong <mryanzhicong@163.com> Co-authored-by: YanZhicong <mryanzhicong@163.com>	2026-03-19 11:06:17 +08:00
LoganJane	270c5cb8cd	[CI] Add nightly CI test cases for the Kimi-K2.5 (#7416 ) ### What this PR does / why we need it? Add nightly CI test cases for the Kimi-K2.5. - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: LoganJane <loganJane73@hotmail.com> Signed-off-by: LoganJane <42287016+LoganJane@users.noreply.github.com>	2026-03-19 11:02:29 +08:00
meihanc	ab9cd2e305	[CI]Add CI summary log (#7202 ) ### What this PR does / why we need it? This PR adds a new CI log summarizer, `ci_log_summary.py`, and wires it into unit-test and e2e workflows so failed jobs publish a structured failure summary to the GitHub step summary. Examples: - `python3 .github/workflows/scripts/ci_log_summary.py --log-file /tmp/unit-test.log --mode ut --step-name "Unit test"` - `python3 .github/workflows/scripts/ci_log_summary.py --run-id 23127187822 --format json` A maintenance note is added to `ci_utils.py` to clarify that the `START` / `PASSED` / `FAILED (exit code X)` log lines are parsed by `ci_log_summary.py`, so any future format changes must be coordinated with the corresponding summarizer regexes. 🤖 Generated with [Codex]<noreply@openai.com> - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: meihanc <jcccx.cmh@gmail.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 09:32:06 +08:00
Nengjun Ma	8b79d4de52	Main2main upgrade to vllm 0317 afternoon (#7409 ) ### What this PR does / why we need it? 1.fix "TypeError: get_attn_backend() remove variable": [Refactor `check_and_update_config`](https://github.com/vllm-project/vllm/pull/35122) 2.fix [Rename `compile_ranges_split_points` to `compile_ranges_endpoints`](https://github.com/vllm-project/vllm/pull/36027) 3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace memory related torch.cuda APIs"](https://github.com/vllm-project/vllm/pull/37031) 4.fix [Support multiple KV groups in OffloadingSpec ](https://github.com/vllm-project/vllm/pull/36610) removed self.offloaded_block_size and changed self.gpu_block_size from a scalar to a tuple of per-group block sizes, adding block_size_factor. 5.fix [Consolidate SupportsEagle](https://github.com/vllm-project/vllm/pull/36063) renamed get_eagle3_aux_hidden_state_layers() to get_eagle3_default_aux_hidden_state_layers() and added a supports_eagle3() guard before calling it. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? E2E - vLLM version: v0.17.0 - vLLM main: `8a680463fa` --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>	2026-03-18 23:24:27 +08:00
jiangmengyu18	305820f1a9	[Bugfix] fix bug about model type of qwen3_vl_8b_instruct_w8a8 (#7383 ) ### What this PR does / why we need it? Adapt to the model type of Qwen3-VL-8B-Instruct-W8A8 - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: betta18 <jiangmengyu1@huawei.com> Co-authored-by: betta18 <jiangmengyu1@huawei.com>	2026-03-18 20:30:03 +08:00
SparrowMu	fb8e22ec00	[DOC] MiniMax-M2.5 model intro (#7296 ) ### What this PR does / why we need it? 1. Add nightly test on MiniMax-M2.5 with deployment method on A3 2. Add MiniMax-M2.5 deployment introduction to vllm-ascend docs - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: limuyuan <limuyuan3@huawei.com> Signed-off-by: SparrowMu <52023119+SparrowMu@users.noreply.github.com> Co-authored-by: limuyuan <limuyuan3@huawei.com>	2026-03-18 20:14:36 +08:00
LoganJane	2916601e6c	[CI] add Kimi-K2.5 weights download (#7406 ) ### What this PR does / why we need it? Add Kimi-K2.5 weights download. - vLLM version: v0.17.0 - vLLM main: `4497431df6` Signed-off-by: LoganJane <loganJane73@hotmail.com>	2026-03-18 18:29:37 +08:00
dependabot[bot]	1ff9e3f25f	[CI] Bump docker/login-action from 3 to 4 (#7299 ) Bumps [docker/login-action](https://github.com/docker/login-action) from3 to 4. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-18 17:06:48 +08:00
dependabot[bot]	b3206cd6f6	[CI] Bump actions/setup-python from 5 to 6 (#7298 ) Bumps [actions/setup-python](https://github.com/actions/setup-python)from 5 to 6. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-18 17:06:28 +08:00
Li Wang	5894a27bfd	[CI] Add PAT_TOKEN when checkout (#7400 ) ### What this PR does / why we need it? When we checkout the fork repo and wanna to submit push to the fork repo, the pat_token is needed - vLLM version: v0.17.0 - vLLM main: `4497431df6` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-18 10:31:32 +08:00
zhangyiming	1c954ff264	[main2main] upgrade vllm to 0308 (#7213 ) ### What this PR does / why we need it? Update main2main to vllm 0308. breaks: * https://github.com/vllm-project/vllm/pull/30681 * https://github.com/vllm-project/vllm/pull/35552 remove self.cudagraph_batch_sizes * https://github.com/vllm-project/vllm/pull/35158 clear_metadata -> defer_finalize * https://github.com/vllm-project/vllm/pull/36006 remove CacheConfig.cpu_offload_gb * https://github.com/vllm-project/vllm/pull/35472 * https://github.com/vllm-project/vllm/pull/34552 attn_metadata_builder * https://github.com/vllm-project/vllm/pull/30515 profile_seq_lens * https://github.com/vllm-project/vllm/pull/28053 - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: menogrey <1299267905@qq.com> Co-authored-by: MrZ20 <2609716663@qq.com>	2026-03-18 09:24:43 +08:00
drizzlezyk	79ef41a53d	[CI] add scheduled stale issue management (#7354 ) ### What this PR does / why we need it? 1. issue with "resolved", 7 days stale, 14 days closed after stale with `stale` and `resolved` label. 2. issue with "awaiting-feedback", 7 days stale, 14 days closed after stale with `stale` and `awaiting-feedback` label. Change items: - Add a scheduled stale-management workflow to process resolved and awaiting-feedback issues independently. - Automatically mark inactive issues as stale , post tailored reminder messages, and close issues after a grace period. - Remove source labels when issues become active again, and disable PR stale handling so the automation remains issue-scoped. ### Does this PR introduce _any_ user-facing change? - No API or runtime behavior changes. - This PR only updates GitHub issue automation (labeling and stale management workflow). ### How was this patch tested? - Test locally - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: drizzlezyk <drizzlezyk@163.com>	2026-03-17 23:28:29 +08:00
rjg-lyh	4d443b9228	[bugfix] restore pr-7029 and fix patch error (#7294 ) ### What this PR does / why we need it? This PR restores #7029, which adds W8A8C8 support for dsv3.2/glm5 using the `lightning_indexer_quant` ops in the pd-mix stage. The original PR was reverted by #7288 because the patch did not work with the recompute scheduler. This PR also fixes the patching issue so that it works correctly with the recompute scheduler. ### Does this PR introduce _any_ user-facing change? Yes. To enable LI C8, users need to set the `enable_sparse_c8` option to `"true"` in `additional_config`. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: rjg-lyh <1318825571@qq.com>	2026-03-16 15:39:42 +08:00
zhaomingyu13	9320365dab	[Test][Feature] Add e2e test for QuaRot model with eagle3 (#7128 ) ### What this PR does / why we need it? Add an e2e test for QuaRot model with eagle3 that runs both the QuaRot model and the float model, and then compares their acceptance rates. The QuaRot model adapting eagle3 PR(#6914, #7038) - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>	2026-03-16 15:35:55 +08:00
Mengqing Cao	e20f0b1a0d	[ReleaseNote] Add release note for v0.17.0rc1 (#7240 ) ### What this PR does / why we need it? This pull request adds the release notes for `v0.17.0rc1`. It also updates version numbers across various documentation files, including `README.md`, `README.zh.md`, `docs/source/community/versioning_policy.md`, and `docs/source/conf.py` to reflect the new release. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e`	2026-03-15 22:47:47 +08:00
pppeng	7e85f2ff97	[CI] Add test_qwen3_5.py (#7133 ) ### What this PR does / why we need it? Add test_qwen3_5.py for base scenarios tp4 on Qwen3.5-27B and Qwen3.5-35B-A3B. - vLLM version: main - vLLM main: `4034c3d32e` --------- Signed-off-by: pppeng <zepengliu912@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2026-03-15 22:19:02 +08:00
Mengqing Cao	0c299f79b9	Revert "[Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029 )" (#7288 ) ### What this PR does / why we need it? This reverts commit `7ed9e9de69`, which introduces an issue that the patch doesn't work with recompute scheduler enabled. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2026-03-15 20:19:09 +08:00
yupeng	29f195a91c	[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 ) ### What this PR does / why we need it? Fix the error that reports while initializing qwen3-reranker-0.6b model with `--enable-lora`. And add a testcase to verify the fix. - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2026-03-15 17:55:42 +08:00
Mengqing Cao	986cd45397	[Version] Drop 0.16.0 support (#7153 ) ### What this PR does / why we need it? Drop 0.16.0 support in main - Fix eagle proposer break introduced by https://github.com/vllm-project/vllm/pull/34552. Mainly change to use the draft attention group to initialize the attention metadata builder. - Fix the `ModelRunner` has no attribute `cudagraph_capture_sizes` error, which is a bug in vLLM v0.17.0, and fixed by a later pr https://github.com/vllm-project/vllm/pull/30515 - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2026-03-13 16:14:15 +08:00
rjg-lyh	7ed9e9de69	[Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029 ) ### What this PR does / why we need it? This PR supports W8A8C8 in dsv3.2/glm5 with lightning_indexer_quant ops in pd-mix stage mainly. Because the code for the current PD-disaggregated scenario is still under refactoring and cleanup, this PR prioritizes ensuring the C8 functionality in the pd-mix scenario. The next steps are planned in two parts: ① Once the optimized scatter operator is updated, we will replace the original operator to improve the performance of storing k_scale. ② Once the code logic for the PD-disaggregated scenario becomes stable, we will carry out more comprehensive validation and make appropriate adaptations. ③ Because enabling C8 currently introduces several new operators whose performance still needs improvement, performance may regress in some scenarios. Therefore, only after all the operators are fully ready can we ensure that this feature does not cause any performance degradation. At that point, we will enable this feature by default and remove the switch in `additional_config`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: rjg-lyh <1318825571@qq.com>	2026-03-13 14:47:42 +08:00
pppeng	6ee7ffb98a	Add Qwen3_5 to model list (#7130 ) ### What this PR does / why we need it? The pr aims to add new models like Qwen3.5-35B-A3B/Qwen3.5-27B to model list for testing. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: pppeng <60355449+ppppeng@users.noreply.github.com>	2026-03-13 11:42:28 +08:00
Li Wang	1f71da80eb	[CI] Fix server start failure when long weight loading (#7098 ) ### What this PR does / why we need it? When loading large models (e.g., 163 shards), weight loading can exceed the default 600s timeout. Engine startup timeout with the error: ```shell TimeoutError: Timed out waiting for engines to send initial message on input socket. ``` We should increase the `VLLM_ENGINE_READY_TIMEOUT_S ` to avoid it ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-13 08:52:56 +08:00
Li Wang	7fe0469e27	[CI][Misc] Use offline mode for model downloads (#7179 ) ### What this PR does / why we need it? 1. For all parts of the current test module involving the millisecond download model, add the `local_file_only` parameter to specify offline mode; this ensures that CI will not fail due to network instability. 2. Install modelscope from a fixed commit until it next release ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? check if the env or arg `local_files_only` works 1) set the env: ```shell export HF_HUB_OFFLINE=1 ``` 2) run the script ```python from transformers import PretrainedConfig import huggingface_hub from modelscope.utils.hf_util import patch_hub patch_hub() model="Qwen/Qwen3-0.6B" kwargs = {} config_dict, _ = PretrainedConfig.get_config_dict( model, trust_remote_code=True, local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE, kwargs, ) print(config_dict) ``` it works well: ```shell 2026-03-06 06:40:12,546 - modelscope - WARNING - We can not confirm the cached file is for revision: master The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. {'architectures': ['Qwen3ForCausalLM'], 'attention_bias': False, 'attention_dropout': 0.0, 'bos_token_id': 151643, 'eos_token_id': 151645, 'head_dim': 128, 'hidden_act': 'silu', 'hidden_size': 1024, 'initializer_range': 0.02, 'intermediate_size': 3072, 'max_position_embeddings': 40960, 'max_window_layers': 28, 'model_type': 'qwen3', 'num_attention_heads': 16, 'num_hidden_layers': 28, 'num_key_value_heads': 8, 'rms_norm_eps': 1e-06, 'rope_scaling': None, 'rope_theta': 1000000, 'sliding_window': None, 'tie_word_embeddings': True, 'torch_dtype': 'bfloat16', 'transformers_version': '4.51.0', 'use_cache': True, 'use_sliding_window': False, 'vocab_size': 151936, '_commit_hash': None} ``` 3) test the model repo does not cached locally when the env `HF_HUB_OFFLINE`==True ```python from transformers import PretrainedConfig import huggingface_hub from modelscope.utils.hf_util import patch_hub patch_hub() model="FireRedTeam/FireRed-OCR" kwargs = {} config_dict, _ = PretrainedConfig.get_config_dict( model, trust_remote_code=True, local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE, kwargs, ) print(config_dict) ``` and the result is as expected: ```shell File "/workspace/demo.py", line 12, in <module> config_dict, _ = PretrainedConfig.get_config_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 189, in patch_get_config_dict model_dir = get_model_dir(pretrained_model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 164, in get_model_dir model_dir = snapshot_download( ^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 137, in snapshot_download return _snapshot_download( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 283, in _snapshot_download raise ValueError( ValueError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable look-ups and downloads online, set 'local_files_only' to False ``` - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-13 08:52:24 +08:00
drizzlezyk	5fe7942bbd	[CI] add action for issue labeler on issue open/edit (#7208 ) ### What this PR does / why we need it? New Workflow File bot_issue_manage.yaml Automatically runs when issues are opened or edited Uses the official GitHub Issue Labeler action to categorize issues Label Configuration issue-labeler.yml Defines regex patterns for model-specific labels (310p, GLM5, Qwen 3.5, DeepSeek, Kimi K2, Kimi K2.5) Enables automatic issue classification based on title/content matching ### Does this PR introduce _any_ user-facing change? No. This PR only introduces internal GitHub Actions workflow and configuration changes. There are no API, interface, or behavior changes visible to end users. It purely improves the issue management process on GitHub. ### How was this patch tested? - GitHub Actions workflow syntax is valid and follows the official GitHub documentation - The issue labeler action (github/issue-labeler@v3.4) is a well-maintained official GitHub action - Configuration file follows the expected YAML format for the issue-labeler action - Regex patterns for model names have been verified for correct syntax - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: drizzlezyk <drizzlezyk@163.com>	2026-03-12 20:16:17 +08:00
Li Wang	88c56e3bf2	[Misc] Fix main lint to make CI happy (#7204 ) ### What this PR does / why we need it? Fix lint failed due to the merging of a previous PR. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-12 18:27:48 +08:00
Li Wang	d866e6b238	[Bugfix] Fixed permission issues with the automatic PR submission workflow (#7142 ) ### What this PR does / why we need it? Auto submit a pull request via https://github.com/vllm-ascend-ci/vllm-ascend, the workflow looks like: 1. get a new config.yaml via run e2e tests 2. push the changed `config.yaml` to a new branch of https://github.com/vllm-ascend-ci/vllm-ascend 3. submit a pull request to vllm-ascend via gh cli ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-12 17:18:59 +08:00
tfhddd	21fea86b08	feat: [CI] Introduce uv to accelerate pip install (#7127 ) ### What this PR does / why we need it? Integrates uv: Significantly accelerates pip install execution and resolves concurrency issues caused by traditional pip caching mechanisms. Why pip install uc-manager is explicitly added: This project depends on uc-manager. However, installing it via uv pip install uc-manager currently fails due to a known issue. An issue has already been filed with the upstream uv repository to address this. Consequently, we explicitly invoke pip install uc-manager as a temporary workaround to ensure the build succeeds. https://github.com/ModelEngine-Group/unified-cache-management/issues/736 Why use UV_SYSTEM_PYTHON: 1: No virtual environment has been created yet; this configuration has the same effect as directly using `pip install`. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: tfhddd <2272751277@qq.com>	2026-03-12 16:47:23 +08:00
yupeng	830f39dd70	[Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650 ) ### What this PR does / why we need it? Fix the issue #6143 . ### Does this PR introduce _any_ user-facing change? Allow to start the server with "--enable-lora && --fully-sharded-loras && --tensor_parallel_size 2". ### How was this patch tested? pytest -sv tests/e2e/multicard/2-cards/test_llama32_lora_tp2.py - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-11 15:43:15 +08:00
Mengqing Cao	1a83c8e2f5	[CI] Build Image for v0.16.0rc1 (#7155 ) ### What this PR does / why we need it? Build Image for v0.16.0rc1 - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-11 14:48:50 +08:00
SILONG ZENG	90aa048e60	[CI] Skip `test_mooncake_layerwise_connector.py` in `ut` (#7147 ) ### What this PR does / why we need it? The `test_mooncake_layerwise_connector.py` file in the `ut` test will be skipped for now and fixed later. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: MrZ20 <2609716663@qq.com>	2026-03-11 11:46:29 +08:00
Li Wang	881c38d210	[Misc] Download on both hk and guiyang region (#7129 ) ### What this PR does / why we need it? Since the PVC files for Guiyang and Hong Kong are not shared, we need to trigger the download of both regions simultaneously when downloading the model to ensure that the models in all regions are synchronized. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-10 19:22:32 +08:00
zhangxinyuehfad	67d40f23fd	[CI]Upgrade niglty multi-node-tests max-parallel to 2 (#7035 ) ### What this PR does / why we need it? 1. Increase nightly multi-node test max-parallel from 1 to 2, and fix resource conflicts that arise when tests run concurrently. 2. Fix parse-trigger job: Add an if condition so it only runs on schedule, workflow_dispatch, or PRs labeled nightly-test 3. Adjust nightly schedule: Shift trigger time from 24:00 to 23:45 (UTC+8) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-10 16:25:51 +08:00
dependabot[bot]	3b25ded8b7	[CI] Bump docker/metadata-action from 5 to 6 (#7069 ) Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-10 09:06:04 +08:00
dependabot[bot]	2325bbe79b	[CI] Bump actions/checkout from 4 to 6 (#7070 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-10 09:05:22 +08:00
wanghengkang	c49ce18ea5	[Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977 ) ### What this PR does / why we need it? Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: gcw_61wqY8cy <wanghengkang1@huawei.com>	2026-03-06 14:25:10 +08:00
SILONG ZENG	bd571cf6d6	[Main2Main] Upgrade vLLM to 0303 (#6944 ) ### What this PR does / why we need it? break: - https://github.com/vllm-project/vllm/pull/34102 Disable_full param replaced with valid_modes/invalid_modes API - https://github.com/vllm-project/vllm/pull/35503 Now must return float compilation_time - https://github.com/vllm-project/vllm/pull/35564 New sequence_lengths param added - https://github.com/vllm-project/vllm/pull/33807 A check was performed (if runner_backend != "auto") - https://github.com/vllm-project/vllm/pull/34861 `BaseDeviceCommunicator` now accesses PyTorch's internal `pg_map` to check process group state - https://github.com/vllm-project/vllm/pull/35274 Important change: - https://github.com/vllm-project/vllm/pull/28672 `matcher_utils` directly accesses `torch.ops._C.*` during the import phase. In the Ascend environment, some unregistered ops trigger `AttributeError`, causing e2e initialization failure. https://github.com/vllm-project/vllm-ascend/actions/runs/22607260487/job/65502047131#step:10:2323 https://github.com/vllm-project/vllm/blob/main/vllm/compilation/passes/fusion/matcher_utils.py#L29 This PR adds temporary compatibility placeholders (rms_norm, fused_add_rms_norm, rotate_embedding, static/dynamic fp8 quant, silu_and_mul) to `vllm_ascend/patch/platform/patch_fusion_matcher_compat_ops.py` to ensure no crashes during the import phase. Upstream repairs will be considered later. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>	2026-03-06 09:08:52 +08:00
zhangxinyuehfad	1e4017e3fa	[CI] support nightly ci for per pr by labels (#6483 ) ### What this PR does / why we need it? This PR refactors the nightly CI workflows (A2 and A3) to support running tests against a specific PR's code, in addition to the existing scheduled/dispatch runs using pre-built images. #### Motivation: Previously, nightly tests could only be triggered by schedule or workflow_dispatch, always using the pre-built nightly image. This change allows developers to trigger nightly tests against their own PR's source code, enabling early validation without waiting for a nightly build. #### Changes Trigger logic (parse-trigger job) A new parse-trigger job is introduced in both schedule_nightly_test_a2.yaml and schedule_nightly_test_a3.yaml to centralize trigger evaluation: `schedule / workflow_dispatch`: runs all tests with the pre-built image (existing behavior preserved) `pull_request (labeled + synchronize)`: runs only when:The PR has the nightly-test label, and /nightly [test-names] comment exists (latest one wins) 1. /nightly or /nightly all — runs all tests 2. /nightly test1 test2 — runs only named tests (comma-wrapped for exact matching) #### How to trigger 1. Add the nightly-test label to your PR 2. Comment /nightly (all tests) or /nightly test1 test2 (specific tests) 4. Re-triggering: add another /nightly comment and push a new commit (synchronize event) ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-05 16:46:37 +08:00
zhangxinyuehfad	566c367a10	[CI] Add DeepSeek-V3.2 large EP nightly ci (#6378 ) ### What this PR does / why we need it? Add DeepSeek-V3.2 nightly ci Fix PD routing to exclude headless nodes when collecting prefiller/decoder IPs - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2026-03-04 16:15:56 +08:00
SILONG ZENG	95b44d7b73	[bugfix]fix file not found error in nightly of single-node (#6976 ) ### What this PR does / why we need it? 1. The main image build takes approximately two hours. The main image build time needs to be moved forward to 21pm(UTC+8) to ensure that the nightly image build can use the latest main image. ``` bash schedule: # UTC+8: 8am, 12pm, 16pm, 22pm - cron: '0 0,4,8,14 * * ' ``` ---> ``` bash schedule: # UTC+8: 8am, 12pm, 16pm, 21pm - cron: '0 0,4,8,13 * *' ``` Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641055135#step:8:26 2. The nightly test is encountering the following error: ``` bash ImportError: ascend_transport.so: cannot open shared object file: No such file or directory. ``` Path need to be added： ``` bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc ``` Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641054911#step:7:529 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-03-04 11:47:26 +08:00
Li Wang	d431d7d526	[CI] Enable auto upgrade e2e estimated time for auto-partition suites (#6840 ) ### What this PR does / why we need it? This patch add a schedule triggered workflow for auto upgrade e2e estimated-time for batter load balance 1. The workflow will run the full e2e test to get the duration of each test. 2. The script `update_estimated_time.py` will upgrade the [config.json](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/scripts/config.yaml) according to the latest time 3. The workflow will submit a pull request that includes changes to `config.json` automatically <img width="2484" height="764" alt="image" src="https://github.com/user-attachments/assets/02f3459c-bb3b-4f8e-9966-8bb2e5c1bbea" /> ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` - ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-04 10:38:34 +08:00
SILONG ZENG	859f2c25b9	[Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (#6503 ) ### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. \| Original PR \| Python (`.py`) \| YAML (`.yaml`) \| \| :--- \| :--- \| :--- \| \| [#3568](https://github.com/vllm-project/vllm-ascend/pull/3568) \| `test_deepseek_r1_0528_w8a8_eplb.py` \| `DeepSeek-R1-0528-W8A8.yaml` \| \| [#3631](https://github.com/vllm-project/vllm-ascend/pull/3631) \| `test_deepseek_r1_0528_w8a8.py` \| `DeepSeek-R1-0528-W8A8.yaml` \| \| [#5874](https://github.com/vllm-project/vllm-ascend/pull/5874) \| `test_deepseek_r1_w8a8_hbm.py` \| `DeepSeek-R1-W8A8-HBM.yaml` \| \| [#3908](https://github.com/vllm-project/vllm-ascend/pull/3908) \| `test_deepseek_v3_2_w8a8.py` \| `DeepSeek-V3.2-W8A8.yaml` \| \| [#5682](https://github.com/vllm-project/vllm-ascend/pull/5682) \| `test_kimi_k2_thinking.py` \| `Kimi-K2-Thinking.yaml` \| \| [#4111](https://github.com/vllm-project/vllm-ascend/pull/4111) \| `test_mtpx_deepseek_r1_0528_w8a8.py` \| `MTPX-DeepSeek-R1-0528-W8A8.yaml` \| \| [#3733](https://github.com/vllm-project/vllm-ascend/pull/3733) \| `test_prefix_cache_deepseek_r1_0528_w8a8.py` \| `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` \| \| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) \| `test_qwen3_235b_w8a8.py` \| `Qwen3-235B-A22B-W8A8.yaml` \| \| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) \| `test_qwen3_235b_a22b_w8a8_eplb.py` \| `Qwen3-235B-A22B-W8A8.yaml` \| \| [#3973](https://github.com/vllm-project/vllm-ascend/pull/3973) \| `test_qwen3_30b_w8a8.py` \| `Qwen3-30B-A3B-W8A8.yaml` \| \| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) \| `test_qwen3_32b_int8.py` \| `Qwen3-32B-Int8.yaml` \| \| [#3757](https://github.com/vllm-project/vllm-ascend/pull/3757) \| `test_qwq_32b.py` \| `QwQ-32B.yaml` \| \| [#5616](https://github.com/vllm-project/vllm-ascend/pull/5616) \| `test_qwen3_next_w8a8.py` \| `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` \| \| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) \| `test_qwen2_5_vl_7b.py` \| `Qwen2.5-VL-7B-Instruct.yaml` \| \| [#5301](https://github.com/vllm-project/vllm-ascend/pull/5301) \| `test_qwen2_5_vl_7b_epd.py` \| `Qwen2.5-VL-7B-Instruct-EPD.yaml` \| \| [#3707](https://github.com/vllm-project/vllm-ascend/pull/3707) \| `test_qwen2_5_vl_32b.py` \| `Qwen2.5-VL-32B-Instruct.yaml` \| \| [#3676](https://github.com/vllm-project/vllm-ascend/pull/3676) \| `test_qwen3_32b_int8_a3_feature_stack3.py` \| `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` \| \| [#3709](https://github.com/vllm-project/vllm-ascend/pull/3709) \| `test_prefix_cache_qwen3_32b_int8.py` \| `Prefix-Cache-Qwen3-32B-Int8.yaml` \| \| [#5395](https://github.com/vllm-project/vllm-ascend/pull/5395) \| `test_qwen3_next.py` \| `Qwen3-Next-80B-A3B-Instruct-A2.yaml` \| \| [#3474](https://github.com/vllm-project/vllm-ascend/pull/3474) \| `test_qwen3_32b.py` \| `Qwen3-32B.yaml` \| \| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) \| `test_qwen3_32b_int8.py` \| `Qwen3-32B-Int8-A2.yaml` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2026-03-03 20:13:43 +08:00
Xiaoshuang Wang	f7a8befc20	[CI] Upgrade CANN to 8.5.1 (#6897 ) ### What this PR does / why we need it? [CI] Upgrade CANN to 8.5.1 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: wxsIcey <1790571317@qq.com>	2026-03-03 09:02:42 +08:00
pu-zhe	632801b0ad	[CI][310P] Add 310p tracked files in CI light. (#6923 ) ### What this PR does / why we need it? Add 310p tracked files in CI light. 'vllm_ascend/attention/attention_v1.py' 'vllm_ascend/ops/fused_moe/**' ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: pu-zhe <zpuaa@outlook.com>	2026-03-02 18:03:46 +08:00

1 2 3 4 5 ...

586 Commits