43 Commits

Author SHA1 Message Date
Li Wang
2c2d8bb015 [cherry-pick][CI] Enforce torchaudio and torchvison compatible with pta (#7688)
### What this PR does / why we need it?
This patch cherry-pick from #7648 

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-27 11:06:13 +08:00
SILONG ZENG
1e3c1e76bf [Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511)
### What this PR does / why we need it?
This PR introduces several upstream `vllm`-aligned lint hooks into
`vllm-ascend` and makes them part of the actual `pre-commit` flow.

Main changes in this PR:
- add `check-boolean-context-manager` to catch boolean expressions in
`with` statements
- add `check-forbidden-imports` to forbid direct `re` imports and
disallowed direct `triton` imports
- enable shell script linting through `tools/shellcheck.sh`
- add root `.clang-format` aligned with upstream `vllm`, enable
`clang-format` in `pre-commit`, temporarily **exclude all `csrc/**`**
from `clang-format` to avoid bringing a large native code reformat into
this PR

This PR focuses on landing the smaller and immediately useful lint
alignment first, without mixing in the larger requirements-management
migration.

### Does this PR introduce _any_ user-facing change?
No.

This PR only updates repository lint configuration, static checks, and
internal import/style enforcement. It does not change runtime behavior
or public interfaces.

### How was this patch tested?
Tested locally in the project virtual environment.

Commands used:
```bash
bash format.sh
```
Verified checks passed:
``` bash
ruff check...............................................................Passed
ruff format..............................................................Passed
codespell................................................................Passed
typos....................................................................Passed
clang-format.............................................................Passed
Lint GitHub Actions workflow files.......................................Passed
Lint shell scripts.......................................................Passed
Lint PNG exports from excalidraw.........................................Passed
Check for spaces in all filenames........................................Passed
Enforce __init__.py in Python packages...................................Passed
Check for forbidden imports..............................................Passed
Check for boolean ops in with-statements.................................Passed
Suggestion...............................................................Passed
- hook id: suggestion
- duration: 0s

To bypass pre-commit hooks, add --no-verify to git commit.
```
**note:**
clang-format is enabled but currently excludes all csrc/**


- vLLM version: v0.17.0
- vLLM main:
8b6325758c

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-24 20:03:01 +08:00
Frank Chen
3da2ba22eb [Platform] Enable ARM-only CPU binding with NUMA-balanced A3 policy and update docs/tests (#6686)
### What this PR does / why we need it?

- Keeps enable_cpu_binding default on, but skips binding on non‑ARM CPUs
inside bind_cpus, with a clear log.
- Uses a table-driven binding policy: A3 uses NUMA‑balanced binding;
other device types use NUMA‑affinity binding.
- Updates docs to reflect the exact behavior and adds/updates unit tests
for the new logic.

### Does this PR introduce _any_ user-facing change?

- Yes. CPU binding is now enabled by default via additional_config, and
documented in the user guide.
- CPU binding behavior differs by device type (A3 vs. others).

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.14.1
- vLLM main: d7de043

---------

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
2026-02-25 11:15:14 +08:00
Icey
ee59429015 upgrade main to 0212 (#6712)
### What this PR does / why we need it?
Fixes `transformers_utils/processors/__init__` import error, due to
https://github.com/vllm-project/vllm/pull/33247
Fixes Fused MoE break introduced by `MoERunner abstraction,` due to
https://github.com/vllm-project/vllm/pull/32344

> delete AscendMoERunnere when
https://github.com/vllm-project/vllm/pull/35178 is merged

Fixes `Make Qwen3VL compatible with Transformers v5`, due to
https://github.com/vllm-project/vllm/pull/34262

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
2026-02-25 09:17:29 +08:00
wangxiyuan
f8e76a49fa [CI] Upgrade trasnformers version (#6307)
Upgrade transformers to >=4.56.4

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-28 14:06:39 +08:00
wjunLu
72ffc00b86 [Bugfix] Fix structured outputs errors: TypeError: apply_token_bitmask_inplace_cpu() (#6151)
### What this PR does / why we need it?
Fix https://github.com/vllm-project/vllm-ascend/issues/5524

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-01-23 09:52:55 +08:00
wjunLu
a7d781f135 [Main] Upgrade PTA to 2.9.0 (#6112)
### What this PR does / why we need it?
Upgrade PTA to 2.9.0

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-01-22 17:59:06 +08:00
wangxiyuan
69740039b7 [CI] Upgrade CANN to 8.5.0 (#6070)
### What this PR does / why we need it?
1. Upgrade CANN to 8.5.0
2. move triton-ascend 3.2.0 to requirements

note: we skipped the two failed e2e test, see
https://github.com/vllm-project/vllm-ascend/issues/6076 for more detail.
We'll fix it soon.


### How was this patch tested?
Closes: https://github.com/vllm-project/vllm-ascend/issues/5494

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-22 09:29:50 +08:00
frankie
ec3563334b Add the requirement of arctic-inference which speculative decoding with suffix_decode (#5045)
### Does this PR introduce _any_ user-facing change?
suffix spec decode method rely on `arctic-inference` library. This PR
add it into requirements to make sure the function works by default
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>
Signed-off-by: frankie <wangyongsheng686@gmail.com>
2026-01-05 19:15:49 +08:00
shaopeng-666
2a2d527e96 fix transformer version to 4.57.3 (#5250)
### What this PR does / why we need it?
In certain scenarios (such as smoke testing), the source code is used to
update the vllm-ascend version for running updated models (such as
Qwen3-VL). However, vllm and vllm-ascend themselves have no restrictions
on the transformer version, and the transformer will not be updated,
resulting in errors when launching the model.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
2025-12-23 23:55:40 +08:00
zhangxinyuehfad
18d2395f5e [Bugfix] fix fastapi version (#5047)
### What this PR does / why we need it?

fix fastapi version == 0.123.10(<0.124.0)

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-12-16 15:58:27 +08:00
wangxiyuan
0d14f635b4 upgrade torch npu version (#4433)
vLLM graph feature now rely on torch >=2.8. To make graph mode work, we
need upgrade torch version as well. For long term support, upgrade torch
to a newer one is good to go as well.

Related vLLM change: https://github.com/vllm-project/vllm/pull/25110


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
2025-12-01 19:01:55 +08:00
LHXuuu
bdc66972db [Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036)
### What this PR does / why we need it?

While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.

1. Add AscendCompressedTensorsConfig to replace CompressedTensorsConfig
in vllm.
2. Support CompressedTensorsW8A8 static weight.
- weight: per-channel, int8, symmetric; activation: per-tensor, int8,
symmetric.
4. Support CompressedTensorsW8A8Dynamic weight.
- weight: per-channel, int8, symmetric; activation: per-token, int8,
symmetric, dynamic.
5. Modify the override_quantization_method in AscendQuantConfig.

Co-authored-by: taoqun110 taoqun@huawei.com
Co-authored-by: chenxi-hh chen464822955@163.com

- vLLM version: v0.11.2

---------

Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: chenxi-hh <chen464822955@163.com>
Signed-off-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
Co-authored-by: chenxi-hh <chen464822955@163.com>
Co-authored-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
2025-11-28 14:09:39 +08:00
SILONG ZENG
941d54a2ce [bugfix]Return the Transformer version from 4.57.2 to 4.57.1 (#4423)
### What this PR does / why we need it?
This PR pins the transformers dependency to 4.57.1.

Reason: CI tests (specifically test_completion_with_prompt_embeds.py)
are failing with an AttributeError: 'dict' object has no attribute
'model_type' when using newer versions of transformers.

The issue stems from a bug in tokenization_utils_base.py where the code
attempts to access the model_type field of a configuration dictionary
(_config) using dot notation (_config.model_type) instead of dictionary
key lookup (_config["model_type"] or _config.get("model_type")). This
occurs in the logic block checking for transformers_version <= 4.57.2.

Pinning the version to 4.57.1 bypasses this buggy code path and restores
CI stability.

Error Traceback:
``` shell
/usr/local/python3.11.13/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2419: 
if _is_local and _config.model_type not in [
E   AttributeError: 'dict' object has no attribute 'model_type'
```

- vLLM main:
2918c1b49c

Signed-off-by: MrZ20 <2609716663@qq.com>
2025-11-25 15:32:24 +08:00
wangxiyuan
fcc9a0eaeb Update torch-npu version to 2.7.1 (#3896)
### What this PR does / why we need it?
Upgrade torch-npu to the official release version 2.7.1


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-31 17:16:31 +08:00
wangxiyuan
10772d94e3 [Build] Force torch version (#3791)
We notice that sometimes user build vllm-ascend with incorrect torch
version. In this case, the build is passed, but when running the code,
the error `AttributeError: '_OpNamespace' '_C_ascend' object has no
attribute 'weak_ref_tensor'` is raised. Let's force the torch version to
2.7.1 and check the torch version when build from source to fix the
issue.

closes: #3342

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-30 15:53:15 +08:00
22dimensions
afc58184ec [Installation] limit opencv-python-headless version to resolve numpy version conflict (#3713)
### What this PR does / why we need it?

vllm requires opencv-python-headless >= 4.11.0 which requires
(numpy<2.3.0,>=2), but vllm-ascend numpy version must be less than
2.0.0, so limit opencv-python-headless less than 4.11.0.86 will fix this
conflict.

### How was this patch tested?

tested by CI 

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-10-25 18:07:54 +08:00
KyrieWang
60e2be1b36 [Feat] Dynamic Batch Feature (#3490)
[RFC](https://github.com/vllm-project/vllm-ascend/issues/3328) for more
details.
Add dynamic batch feature in chunked prefilling strategy, the token
budget can be refined to achieve better effective throughput and TPOT.

!!! NOTE: only 910B3 is supported till now, we are working on further
improvements.
Additional file for lookup table is required.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: Cheng Wang <wangchengkyrie@outlook.com>
2025-10-22 14:13:32 +08:00
wangxiyuan
ba19dd3183 Revert PTA upgrade PR (#3352)
we notice that torch npu 0919 doesn't work. This PR revert related
change which rely on 0919 version.
Revert PR: #3295  #3205  #3102 

Related: #3353

- vLLM version: v0.11.0
2025-10-10 14:09:53 +08:00
wangxiyuan
4abdcdba4e upgrade pta to 0919 (#3295)
### What this PR does / why we need it?
Upgrade torch-npu to the newest POC version
### Does this PR introduce _any_ user-facing change?
yes, user need upgrade the pta version as well.
### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-30 17:14:23 +08:00
wangxiyuan
9260910c8d [CI] Fix broken CI (#2302)
1. disable test_eagle_ccorrectness test, we'll reopen it once oom error
fixed.
2. drop transformers version limit for main, since vLLM rely on
>=4.55.0, see:
65552b476b
3. fix kv_connector_output bug, see:
796bae07c5

- vLLM version: v0.10.0
- vLLM main:
d1af8b7be9

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-11 11:22:32 +08:00
leo-pony
807f0895b2 Bump torch version to 2.7.1 (#1562)
### What this PR does / why we need it?
Bump torch version to 2.7.1, and cleanup infer schema patch
https://github.com/vllm-project/vllm-ascend/commit/857f489
(https://github.com/vllm-project/vllm-ascend/pull/837), this patch
depends on also: https://github.com/vllm-project/vllm-ascend/pull/1974

### Does this PR introduce any user-facing change?
No

#### How was this patch tested?
CI passed

torch-npu 2.7.1rc1 install guide:
https://gitee.com/ascend/pytorch/tree/v2.7.1/
install depending:
```
pip3 install pyyaml
pip3 install setuptools
```
install torch-npu:

Closes: https://github.com/vllm-project/vllm-ascend/issues/1866
Closes: https://github.com/vllm-project/vllm-ascend/issues/1390


- vLLM version: v0.10.0
- vLLM main:
9af654cc38

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-05 08:43:24 +08:00
Yikun Jiang
17a430f7b8 Upgrade vLLM to v0.10.0 (#1927)
### What this PR does / why we need it?
- Upgrade to v0.10.0
- Drop v0.9.2 version compatibility
- Add patch for
`vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py`
as workaround of
f3a683b7c9
for v0.10.0 and also add e2e test `test_models_prompt_logprobs`
- Pin transformers<4.54.0 as workaround of
https://github.com/vllm-project/vllm-ascend/issues/2034

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Test locally:
`VLLM_USE_MODELSCOPE=true pytest -sv
tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs`
- CI passed

- vLLM version: v0.9.2
- vLLM main:
7728dd77bb

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-26 15:43:29 +08:00
wangxiyuan
2b726d8f90 [CI] Fix broken CI (#1889)
1. vLLM commit
45badd05d0
changed the pooling check logic which broken vLLM Ascend.
2. vLLM commit
3e04107d97
requires higher version of transformers. The transformers version bug
has been fixed by
e936e401de.
We can safe to remove the version limit now.
3. vLLM commit
217937221b
added a new input `enable_eplb` for FusedMoe Ops

This PR fix the broken CI.


- vLLM version: v0.9.2
- vLLM main:
6a971ed692

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-20 02:11:57 +08:00
Mengqing Cao
2cf9c4c3a2 [CI/Build] Fix version conflict on transformers (#1490)
### What this PR does / why we need it?
Fix version conflict on transformers:
`pip._vendor.pkg_resources.ContextualVersionConflict: (transformers
4.53.0 (/usr/local/python3.10.17/lib/python3.10/site-packages),
Requirement.parse('transformers<4.53.0'), {'vllm-ascend'})`
Fix
https://github.com/vllm-project/vllm-ascend/actions/runs/15933263325/job/44947231642

### Does this PR introduce _any_ user-facing change?
Fix broken build

### How was this patch tested?
CI passed with new existing test.

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-06-28 15:11:04 +08:00
Mengqing Cao
d59e7fa095 [CI] Pin transformers<4.53.0 and fix EPLB load_weights to make CI passed (#1482)
### What this PR does / why we need it?

- Fix vLLM EPLB break
e9fd658a73
by recovering load_weights back to [v0.9.1
version](07b8fae219)
temporarily.

- Fix transformers>=4.53.0 image processor break
Related: https://github.com/vllm-project/vllm-ascend/issues/1470

- Mirror torch_npu requirements to pyproject.toml

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-28 00:12:43 +08:00
Pleaplusone
7e6efbf2a9 update torch-npu to 2.5.1.post1.dev20250619 (#1347)
### What this PR does / why we need it?
This PR update the torch_npu to newest release version
2.5.1.post1.dev20250619 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI tested will guarantee the update

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-06-23 09:02:09 +08:00
Mengqing Cao
96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235)
### What this PR does / why we need it?
1. Fix rank set in DP scenario. The new poc version of torch-npu support
setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the
rank set in `DPEngineCoreProc` directly instead of calculating local
rank across dp by hand in the patched `_init_data_parallel`

Closes: https://github.com/vllm-project/vllm-ascend/issues/1170

2. Bump torch-npu version to 2.5.1.post1.dev20250528

Closes: https://github.com/vllm-project/vllm-ascend/pull/1242
Closes: https://github.com/vllm-project/vllm-ascend/issues/1232


### How was this patch tested?
CI passed with new added test.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>
2025-06-16 23:09:53 +08:00
Yikun Jiang
4976b48b98 [Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121)
### What this PR does / why we need it?
1. The dependency was introduced by
https://github.com/vllm-project/vllm-ascend/pull/874
- Move numba/quart from requirements-dev to requirments
- Align pyproject.toml with requirements

2. This patch also fix deepseek accuracy baseline which
https://github.com/vllm-project/vllm-ascend/pull/1118 was not addressed.
According to https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite the
gsm8k is about `41.1`

3. This also sync the vLLM upstream changes:
eaa2e51088

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
vllm ascend test (basic workflow)
vllm longterm test (spec decode)

Closes: https://github.com/vllm-project/vllm-ascend/issues/1120

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-08 22:33:37 +08:00
wangxiyuan
6193ba679b [CI] add codespell CI and fix format.sh (#827)
1. Fix format check error to make format.sh work
2. Add codespell check CI 
3. Add the missing required package for vllm-ascend.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-12 22:04:48 +08:00
whx
8b194ad12e [Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694)
### What this PR does / why we need it?
- This PR proposes a P2P version of Disaggregated Prefill based on
llm_datadist which manages data transfer.

- This solution reconstructs previous offline single-node Disaggregated
Prefill solution, and supports multi-node and online serveing now.

- Currently this solution supports 1P1D situation of Deepseek hybrid
parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered
in the solution design, and will be supported soon within v1 engine.

---------

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: ganyi <pleaplusone.gy@gmail.com>
2025-05-01 22:31:36 +08:00
Yikun Jiang
2e20797934 [BUILD] Upgrade torch-npu to 2.5.1 (#661)
### What this PR does / why we need it?
The torch-npu 2.5.1 are published:
https://pypi.org/project/torch-npu/2.5.1/
It's time to remove all torch-npu dev version from vllm-ascend code base

### Does this PR introduce _any_ user-facing change?
Yes, using torch-npu 2.5.1

### How was this patch tested?
- [ ] CI passed
- [ ] Manually test
- [ ] Grep all `dev2025`

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-27 17:28:29 +08:00
Bug Hunter Yan
05bdcbeae4 support aclgraph (#426)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
This PR supports the access of vllm-acend to the piecewise_graph feature
provided by the v1 engine.

1. register unifiled_ascend_attention_with_output for piecewise_graph to
split graph.
2. support NPUGraph to accelerate kernel launch.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
support npugraph to default, Users can disenable the npugraph feature by
configuring enforce_eager.

This has corresponding requirements for the versions of torch_npu and
CANN, and they need to support graph capture.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->
it turn to default

---------

Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn>
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-04-23 20:56:24 +08:00
wangxiyuan
bbe7ccd366 [MISC] Add patch module (#526)
This PR added patch module for vllm
1. platform patch: the patch will be registered when load the platform
2. worker patch: the patch will be registered when worker is started.

The detail is:
1. patch_common: patch for main and 0.8.4 version
4. patch_main: patch for main verison
5. patch_0_8_4: patch for 0.8.4 version
2025-04-16 09:28:58 +08:00
wangxiyuan
9c7428b3d5 [CI] enable custom ops build (#466)
### What this PR does / why we need it?
This PR enable custom ops build  by default. 

### Does this PR introduce _any_ user-facing change?

Yes, users now install vllm-ascend from source will trigger custom ops
build step.

### How was this patch tested?
By image build and e2e CI

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-12 10:24:53 +08:00
Yikun Jiang
579d858a20 Set torchvision<0.21.0 to match torch/torch_npu version (#479)
### What this PR does / why we need it?
Set torchvision<0.21.0 to match torch/torch_npu version to resolve
`RuntimeError: operator torchvision::nms does not exist`.

Closes: https://github.com/vllm-project/vllm-ascend/issues/477

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-08 09:15:42 +08:00
Yikun Jiang
adabdeea7f Set numpy < 2.0.0 to resolve numpy VersionConflict (#476)
### What this PR does / why we need it?
vLLM bumps numpy version to 2.x:
8427f70493
, this will cause a
`pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4
(/usr/local/python3.10/lib/python3.10/site-packages),
Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm
ascend install. This PR resolved the issue by:
- Set numpy < 2.0.0 to resolve numpy VersionConflict
- Sync requirements and toml 
- Reorder


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes: https://github.com/vllm-project/vllm-ascend/issues/473

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-07 16:07:21 +08:00
Pleaplusone
ce8259975e [core] Support custom ascendc kernels in vllm-ascend (#233)
This PR add custom ascendc kernel rotary_embedding support in
vllm-ascend, related CMakeLists and setuptools is also added in this PR.

Related: https://github.com/vllm-project/vllm-ascend/issues/156

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-03 14:52:34 +08:00
Mengqing Cao
2dbd763584 [CI] Fix mypy CI (#443)
### What this PR does / why we need it?
Fix CI by updating mypy and pining numpy version

_the modification of model_runner_v1 is just to make CI happy_

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-01 09:25:33 +08:00
Mengqing Cao
36991b2052 [CI] enable CI on all branch (#124)
Enable CI on all branch.
Installing with the torch-npu-2.5.1.dev20250218 so that we could enable
CI on all branch and prepare for merging 0.7.1-dev to main

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-21 16:16:48 +08:00
Huazhong Ji
c8b57d10b2 [Misc] update the dependency version of torch-npu (#50)
### What this PR does / why we need it?

This PR updates the dependency version of vllm-ascend on torch-npu, so
that the vllm-ascend can be installed in a later version environment
(like to torch-npu 2.6.0rc1),

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI Test

Signed-off-by: ji-huazhong <hzji210@gmail.com>
2025-02-12 15:50:38 +08:00
wangxiyuan
c59375caff [Misc] version control by setuptools_scm (#21)
make package version control by setuptools_scm to keep the same with
vllm

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-10 09:36:09 +08:00
Yikun Jiang
d5e7756028 [Core] Init vllm-ascend (#3)
### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
cabaf4eff3

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
2025-02-05 10:53:12 +08:00