### What this PR does / why we need it?
This PR introduces several upstream `vllm`-aligned lint hooks into
`vllm-ascend` and makes them part of the actual `pre-commit` flow.
Main changes in this PR:
- add `check-boolean-context-manager` to catch boolean expressions in
`with` statements
- add `check-forbidden-imports` to forbid direct `re` imports and
disallowed direct `triton` imports
- enable shell script linting through `tools/shellcheck.sh`
- add root `.clang-format` aligned with upstream `vllm`, enable
`clang-format` in `pre-commit`, temporarily **exclude all `csrc/**`**
from `clang-format` to avoid bringing a large native code reformat into
this PR
This PR focuses on landing the smaller and immediately useful lint
alignment first, without mixing in the larger requirements-management
migration.
### Does this PR introduce _any_ user-facing change?
No.
This PR only updates repository lint configuration, static checks, and
internal import/style enforcement. It does not change runtime behavior
or public interfaces.
### How was this patch tested?
Tested locally in the project virtual environment.
Commands used:
```bash
bash format.sh
```
Verified checks passed:
``` bash
ruff check...............................................................Passed
ruff format..............................................................Passed
codespell................................................................Passed
typos....................................................................Passed
clang-format.............................................................Passed
Lint GitHub Actions workflow files.......................................Passed
Lint shell scripts.......................................................Passed
Lint PNG exports from excalidraw.........................................Passed
Check for spaces in all filenames........................................Passed
Enforce __init__.py in Python packages...................................Passed
Check for forbidden imports..............................................Passed
Check for boolean ops in with-statements.................................Passed
Suggestion...............................................................Passed
- hook id: suggestion
- duration: 0s
To bypass pre-commit hooks, add --no-verify to git commit.
```
**note:**
clang-format is enabled but currently excludes all csrc/**
- vLLM version: v0.17.0
- vLLM main:
8b6325758c
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
### What this PR does / why we need it?
- Keeps enable_cpu_binding default on, but skips binding on non‑ARM CPUs
inside bind_cpus, with a clear log.
- Uses a table-driven binding policy: A3 uses NUMA‑balanced binding;
other device types use NUMA‑affinity binding.
- Updates docs to reflect the exact behavior and adds/updates unit tests
for the new logic.
### Does this PR introduce _any_ user-facing change?
- Yes. CPU binding is now enabled by default via additional_config, and
documented in the user guide.
- CPU binding behavior differs by device type (A3 vs. others).
### How was this patch tested?
Added/updated unit tests:
test_cpu_binding.py
1. test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.
test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.
- vLLM version: v0.14.1
- vLLM main: d7de043
---------
Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
### What this PR does / why we need it?
Upgrade PTA to 2.9.0
- vLLM version: v0.13.0
- vLLM main:
d68209402d
---------
Signed-off-by: wjunLu <wjunlu217@gmail.com>
### Does this PR introduce _any_ user-facing change?
suffix spec decode method rely on `arctic-inference` library. This PR
add it into requirements to make sure the function works by default
### How was this patch tested?
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>
Signed-off-by: frankie <wangyongsheng686@gmail.com>
### What this PR does / why we need it?
In certain scenarios (such as smoke testing), the source code is used to
update the vllm-ascend version for running updated models (such as
Qwen3-VL). However, vllm and vllm-ascend themselves have no restrictions
on the transformer version, and the transformer will not be updated,
resulting in errors when launching the model.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
### What this PR does / why we need it?
fix fastapi version == 0.123.10(<0.124.0)
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.
1. Add AscendCompressedTensorsConfig to replace CompressedTensorsConfig
in vllm.
2. Support CompressedTensorsW8A8 static weight.
- weight: per-channel, int8, symmetric; activation: per-tensor, int8,
symmetric.
4. Support CompressedTensorsW8A8Dynamic weight.
- weight: per-channel, int8, symmetric; activation: per-token, int8,
symmetric, dynamic.
5. Modify the override_quantization_method in AscendQuantConfig.
Co-authored-by: taoqun110 taoqun@huawei.com
Co-authored-by: chenxi-hh chen464822955@163.com
- vLLM version: v0.11.2
---------
Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: chenxi-hh <chen464822955@163.com>
Signed-off-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
Co-authored-by: chenxi-hh <chen464822955@163.com>
Co-authored-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
### What this PR does / why we need it?
This PR pins the transformers dependency to 4.57.1.
Reason: CI tests (specifically test_completion_with_prompt_embeds.py)
are failing with an AttributeError: 'dict' object has no attribute
'model_type' when using newer versions of transformers.
The issue stems from a bug in tokenization_utils_base.py where the code
attempts to access the model_type field of a configuration dictionary
(_config) using dot notation (_config.model_type) instead of dictionary
key lookup (_config["model_type"] or _config.get("model_type")). This
occurs in the logic block checking for transformers_version <= 4.57.2.
Pinning the version to 4.57.1 bypasses this buggy code path and restores
CI stability.
Error Traceback:
``` shell
/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2419:
if _is_local and _config.model_type not in [
E AttributeError: 'dict' object has no attribute 'model_type'
```
- vLLM main:
2918c1b49c
Signed-off-by: MrZ20 <2609716663@qq.com>
### What this PR does / why we need it?
Upgrade torch-npu to the official release version 2.7.1
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
We notice that sometimes user build vllm-ascend with incorrect torch
version. In this case, the build is passed, but when running the code,
the error `AttributeError: '_OpNamespace' '_C_ascend' object has no
attribute 'weak_ref_tensor'` is raised. Let's force the torch version to
2.7.1 and check the torch version when build from source to fix the
issue.
closes: #3342
- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
vllm requires opencv-python-headless >= 4.11.0 which requires
(numpy<2.3.0,>=2), but vllm-ascend numpy version must be less than
2.0.0, so limit opencv-python-headless less than 4.11.0.86 will fix this
conflict.
### How was this patch tested?
tested by CI
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: 22dimensions <waitingwind@foxmail.com>
we notice that torch npu 0919 doesn't work. This PR revert related
change which rely on 0919 version.
Revert PR: #3295#3205#3102
Related: #3353
- vLLM version: v0.11.0
### What this PR does / why we need it?
Upgrade torch-npu to the newest POC version
### Does this PR introduce _any_ user-facing change?
yes, user need upgrade the pta version as well.
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
- Upgrade to v0.10.0
- Drop v0.9.2 version compatibility
- Add patch for
`vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py`
as workaround of
f3a683b7c9
for v0.10.0 and also add e2e test `test_models_prompt_logprobs`
- Pin transformers<4.54.0 as workaround of
https://github.com/vllm-project/vllm-ascend/issues/2034
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Test locally:
`VLLM_USE_MODELSCOPE=true pytest -sv
tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs`
- CI passed
- vLLM version: v0.9.2
- vLLM main:
7728dd77bb
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1. vLLM commit
45badd05d0
changed the pooling check logic which broken vLLM Ascend.
2. vLLM commit
3e04107d97
requires higher version of transformers. The transformers version bug
has been fixed by
e936e401de.
We can safe to remove the version limit now.
3. vLLM commit
217937221b
added a new input `enable_eplb` for FusedMoe Ops
This PR fix the broken CI.
- vLLM version: v0.9.2
- vLLM main:
6a971ed692
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Fix version conflict on transformers:
`pip._vendor.pkg_resources.ContextualVersionConflict: (transformers
4.53.0 (/usr/local/python3.10.17/lib/python3.10/site-packages),
Requirement.parse('transformers<4.53.0'), {'vllm-ascend'})`
Fix
https://github.com/vllm-project/vllm-ascend/actions/runs/15933263325/job/44947231642
### Does this PR introduce _any_ user-facing change?
Fix broken build
### How was this patch tested?
CI passed with new existing test.
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
- Fix vLLM EPLB break
e9fd658a73
by recovering load_weights back to [v0.9.1
version](07b8fae219)
temporarily.
- Fix transformers>=4.53.0 image processor break
Related: https://github.com/vllm-project/vllm-ascend/issues/1470
- Mirror torch_npu requirements to pyproject.toml
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
This PR update the torch_npu to newest release version
2.5.1.post1.dev20250619 .
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI tested will guarantee the update
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
1. Fix format check error to make format.sh work
2. Add codespell check CI
3. Add the missing required package for vllm-ascend.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
- This PR proposes a P2P version of Disaggregated Prefill based on
llm_datadist which manages data transfer.
- This solution reconstructs previous offline single-node Disaggregated
Prefill solution, and supports multi-node and online serveing now.
- Currently this solution supports 1P1D situation of Deepseek hybrid
parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered
in the solution design, and will be supported soon within v1 engine.
---------
Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: ganyi <pleaplusone.gy@gmail.com>
### What this PR does / why we need it?
The torch-npu 2.5.1 are published:
https://pypi.org/project/torch-npu/2.5.1/
It's time to remove all torch-npu dev version from vllm-ascend code base
### Does this PR introduce _any_ user-facing change?
Yes, using torch-npu 2.5.1
### How was this patch tested?
- [ ] CI passed
- [ ] Manually test
- [ ] Grep all `dev2025`
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
<!-- Thanks for sending a pull request!
BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html
-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.
- Please clarify why the changes are needed. For instance, the use case
and bug description.
- Fixes #
-->
This PR supports the access of vllm-acend to the piecewise_graph feature
provided by the v1 engine.
1. register unifiled_ascend_attention_with_output for piecewise_graph to
split graph.
2. support NPUGraph to accelerate kernel launch.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
support npugraph to default, Users can disenable the npugraph feature by
configuring enforce_eager.
This has corresponding requirements for the versions of torch_npu and
CANN, and they need to support graph capture.
### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->
it turn to default
---------
Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn>
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.
The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
### What this PR does / why we need it?
This PR enable custom ops build by default.
### Does this PR introduce _any_ user-facing change?
Yes, users now install vllm-ascend from source will trigger custom ops
build step.
### How was this patch tested?
By image build and e2e CI
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Set torchvision<0.21.0 to match torch/torch_npu version to resolve
`RuntimeError: operator torchvision::nms does not exist`.
Closes: https://github.com/vllm-project/vllm-ascend/issues/477
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
vLLM bumps numpy version to 2.x:
8427f70493
, this will cause a
`pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4
(/usr/local/python3.10/lib/python3.10/site-packages),
Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm
ascend install. This PR resolved the issue by:
- Set numpy < 2.0.0 to resolve numpy VersionConflict
- Sync requirements and toml
- Reorder
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Closes: https://github.com/vllm-project/vllm-ascend/issues/473
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Fix CI by updating mypy and pining numpy version
_the modification of model_runner_v1 is just to make CI happy_
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed
Signed-off-by: MengqingCao <cmq0113@163.com>
Enable CI on all branch.
Installing with the torch-npu-2.5.1.dev20250218 so that we could enable
CI on all branch and prepare for merging 0.7.1-dev to main
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
This PR updates the dependency version of vllm-ascend on torch-npu, so
that the vllm-ascend can be installed in a later version environment
(like to torch-npu 2.6.0rc1),
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI Test
Signed-off-by: ji-huazhong <hzji210@gmail.com>
### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.
This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.
This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
2. Make mypy work by ignore base_communicator and clear unused deps
3. Several improvements for vllm_ascend_test:
- use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
- Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
cabaf4eff3
### Does this PR introduce _any_ user-facing change?
Yes, init the PR.
### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>