### What this PR does / why we need it?
1. Allow PR triggers on `*-dev` and `releases/v*` branches for nightly
test workflows.
2. fix image-tag in doc
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
This PR introduces several upstream `vllm`-aligned lint hooks into
`vllm-ascend` and makes them part of the actual `pre-commit` flow.
Main changes in this PR:
- add `check-boolean-context-manager` to catch boolean expressions in
`with` statements
- add `check-forbidden-imports` to forbid direct `re` imports and
disallowed direct `triton` imports
- enable shell script linting through `tools/shellcheck.sh`
- add root `.clang-format` aligned with upstream `vllm`, enable
`clang-format` in `pre-commit`, temporarily **exclude all `csrc/**`**
from `clang-format` to avoid bringing a large native code reformat into
this PR
This PR focuses on landing the smaller and immediately useful lint
alignment first, without mixing in the larger requirements-management
migration.
### Does this PR introduce _any_ user-facing change?
No.
This PR only updates repository lint configuration, static checks, and
internal import/style enforcement. It does not change runtime behavior
or public interfaces.
### How was this patch tested?
Tested locally in the project virtual environment.
Commands used:
```bash
bash format.sh
```
Verified checks passed:
``` bash
ruff check...............................................................Passed
ruff format..............................................................Passed
codespell................................................................Passed
typos....................................................................Passed
clang-format.............................................................Passed
Lint GitHub Actions workflow files.......................................Passed
Lint shell scripts.......................................................Passed
Lint PNG exports from excalidraw.........................................Passed
Check for spaces in all filenames........................................Passed
Enforce __init__.py in Python packages...................................Passed
Check for forbidden imports..............................................Passed
Check for boolean ops in with-statements.................................Passed
Suggestion...............................................................Passed
- hook id: suggestion
- duration: 0s
To bypass pre-commit hooks, add --no-verify to git commit.
```
**note:**
clang-format is enabled but currently excludes all csrc/**
- vLLM version: v0.17.0
- vLLM main:
8b6325758c
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
### What this PR does / why we need it?
Main2main Upgrade vllm commit to 0320 17:00
1. fix vllm refactored `_moe_forward` to call
`runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True.
vllm PR:"[MoE Refactor] DefaultMoERunner simplification
[#33049](https://github.com/vllm-project/vllm/pull/33049)"
2.fix vllm moved the call to `self._set_compile_ranges()` in
`VllmConfig.__post_init__` from **before** `check_and_update_config()`
to **after** it (to allow platforms to lower `max_num_batched_tokens`
first). vllm PR: "fix(xpu): Re-compute compile ranges after
platform-specific config updates"
[#37523](https://github.com/vllm-project/vllm/pull/37523)
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA
- vLLM version: v0.17.0
- vLLM main:
8b6325758c
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
### What this PR does / why we need it?
- Pass GITEE_USERNAME (var) and GITEE_TOKEN (secret) as Docker build
args in nightly image build so Dockerfile can authenticate to Gitee
- In Dockerfile.nightly.a2/a3, embed credentials into clone URL to
avoid auth failure during `git clone`
- In single-node and multi-node PR test workflows, backup the
pre-installed benchmark from the nightly image before wiping
vllm-ascend, then restore it instead of re-cloning from Gitee,
which is inaccessible from fork PR contexts
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.18.0
- vLLM main:
8b6325758c
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
Upgrade vllm commit to 2026.03.19.
1.Fix socket removed from StatelessProcessGroup. Upstream vLLM PR
[#36330](https://github.com/vllm-project/vllm/pull/36330) ("elastic_ep:
Fix stateless group port races") refactored StatelessProcessGroup and
removed the socket: socket.socket | None field. The socket ownership was
moved to a new create_tcp_store() helper instead of being stored as a
field on the dataclass.
2.fix `virtual_engine` parameter removed from `set_forward_context().
Upstream [V0 Deprecation] Deprecate virtual engine
[#37195](https://github.com/vllm-project/vllm/pull/37195)
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA
- vLLM version: v0.17.0
- vLLM main:
8b6325758c
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
### What this PR does / why we need it?
Upgrade vllm commit to 0318.
Main content: Added a pre-operation for cleaning up and waiting(default
max 50s) for the completion of the clean up of the NPU memory to some
test cases that failed due to the failure to release the NPU memory in a
timely manner when the previous test cases were executed.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA
- vLLM version: v0.17.0
- vLLM main:
4497431df6
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
### What this PR does / why we need it?
Adds a scheduled CI workflow (schedule_release_code_and_wheel.yml) to
automatically build and release vllm-ascend source packages and binary
wheels for multiple Ascend hardware targets.
Key features:
1. Source release: Builds tar.gz sdist and uploads to PyPI on version
tag push
2. Multi-hardware wheel builds: Supports three hardware targets in
parallel:
2.1 A2 (Ascend 910B): x86_64 + ARM64, Python 3.10 / 3.11
2.2 A3 (Ascend 910C): x86_64 + ARM64, Python 3.10 / 3.11
2.3 310P: x86_64 + ARM64, Python 3.10 / 3.11
3. Wheel repair: Uses auditwheel to produce manylinux-compatible wheels,
excluding Ascend NPU runtime libs (libascend*.so, libtorch*.so, etc.)
that must be provided by the runtime environment
4. Variant wheels: Generates hardware-variant wheels via variantlib for
hardware-specific distribution
5. OBS upload: Aggregates all variant wheels and a combined index JSON,
then uploads to Huawei OBS for hosting
### Does this PR introduce _any_ user-facing change?
Yes. Users will be able to install hardware-specific vllm-ascend wheels
from PyPI or the OBS variant index, eliminating the need to build from
source.
### How was this patch tested?
1. CI verification only — workflow syntax and job dependency logic
reviewed manually
2. Wheel build steps validated against existing Dockerfiles
(Dockerfile.buildwheel.a2/a3/310p)
3. auditwheel exclusion list verified against known Ascend runtime
shared libraries
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: YanZhicong <mryanzhicong@163.com>
Co-authored-by: YanZhicong <mryanzhicong@163.com>
### What this PR does / why we need it?
1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](https://github.com/vllm-project/vllm/pull/35122)
2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](https://github.com/vllm-project/vllm/pull/36027)
3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](https://github.com/vllm-project/vllm/pull/37031)
4.fix [Support multiple KV groups in OffloadingSpec
](https://github.com/vllm-project/vllm/pull/36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.
5.fix [Consolidate
SupportsEagle](https://github.com/vllm-project/vllm/pull/36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E
- vLLM version: v0.17.0
- vLLM main:
8a680463fa
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
### What this PR does / why we need it?
[CI] Upgrade CANN to 8.5.1
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed with existing test.
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
Signed-off-by: wxsIcey <1790571317@qq.com>
### What this PR does / why we need it?
upgrade vllm commit to `9562912cead1f11e8540fb91306c5cbda66f0007`
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
all tests passed
- vLLM version: v0.15.0
- vLLM main:
13397841ab
---------
Signed-off-by: wxsIcey <1790571317@qq.com>
### What this PR does / why we need it?
This PR upgrades the core vLLM dependency to a newer version from the
main branch (`13397841ab469cecf1ed425c3f52a9ffc38139b5`). This is
necessary to keep our project up-to-date with the latest features and
fixes from upstream vLLM.
1.
ac32e66cf9
pass file is moved.
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Co-authored-by: wxsIcey <1790571317@qq.com>
### What this PR does / why we need it?
- This PR removes several self-hosted runner labels from the
`actionlint.yaml` configuration file. These runners are likely no longer
in use, so this change cleans up the configuration and ensures
`actionlint` has an accurate list of available runners.
- Move all Action dockerfiles to one folder
- remove useless `runner` input for e2e test.
- update workflow option version
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This is a configuration change for the CI linter. The correctness will
be verified by `actionlint` running in CI on subsequent pull requests.
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
This PR upgrades the vLLM dependency from `v0.14.1` to `v0.15.0`. This
involves:
- Updating the `VLLM_TAG` in all `Dockerfile`.
- Updating the vLLM version in `docs/source/conf.py`.
- Removing conditional code paths specific to `v0.14.1` across the
codebase, which simplifies maintenance.
- Fix `TypeError: MMEncoderAttention.__init__() got an unexpected
keyword argument 'multimodal_config'` due to
https://github.com/vllm-project/vllm/pull/31972.
- Fix `_shared_experts: 'NoneType' object is not callable` due to
https://github.com/vllm-project/vllm/pull/32082 by
https://github.com/vllm-project/vllm-ascend/pull/6335.
- Fix `ReshapeAndCacheOperation setup failed!` due to
https://github.com/vllm-project/vllm/pull/25954 by overriding attention
metadata slots.
This upgrade is necessary to keep the project aligned with the latest
features, bug fixes, and API changes in the vLLM project.
### Does this PR introduce _any_ user-facing change?
No, this is an internal dependency update and does not introduce any
user-facing changes.
### How was this patch tested?
CI is expected to pass with these changes, ensuring that all existing
tests are successful with the new vLLM version.
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8
co-authored-by: shen-shanshan <467638484@qq.com>
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
This patch purpose to optimize the lint check term. The main idea is to
reduce unnecessary installation time.
1. The installation of vllm is not must, only append the path of vllm
src to the `PATHONPATH` is effective
2. This installation of `requirements-dev.txt` is not must, we have a
pre-built image `quay.io/ascend-ci/vllm-ascend:lint` with all the
requirements installed in advance.
**NOTE**: the conditions for triggering image builds are: 1).Daily
scheduled build; 2) Build when requirements are modified; 3) Manual
build. This ensures that the dependencies in our image are up-to-date to
the greatest extent possible.
3. The `mypy` was separated from the `pre-commit` hook for performance
reasons; we found that integrating `mypy` into the `pre-commit` hook
resulted in poor performance.
4. Reduce the CPU core consumption from 16 -> 8
### Does this PR introduce _any_ user-facing change?
The end-to-end lint time was optimized from 20min/per PR to 8min/per PR
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2c24bc6996
---------
Signed-off-by: wangli <wangli858794774@gmail.com>