33 Commits

Author SHA1 Message Date
zhangxinyuehfad
d781902ce9 [v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686)
### What this PR does / why we need it?
Fix releases/v0.18.0 ci test only support vllm v0.18.0 

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-03-26 18:36:04 +08:00
Nengjun Ma
fcba91a392 Main2main Upgrade vllm commit to 0320 17:00 (#7510)
### What this PR does / why we need it?
Main2main Upgrade vllm commit to 0320 17:00

1. fix vllm refactored `_moe_forward` to call
`runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True.
vllm PR:"[MoE Refactor] DefaultMoERunner simplification
[#33049](https://github.com/vllm-project/vllm/pull/33049)"

2.fix vllm moved the call to `self._set_compile_ranges()` in
`VllmConfig.__post_init__` from **before** `check_and_update_config()`
to **after** it (to allow platforms to lower `max_num_batched_tokens`
first). vllm PR: "fix(xpu): Re-compute compile ranges after
platform-specific config updates"
[#37523](https://github.com/vllm-project/vllm/pull/37523)


### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?
NA

- vLLM version: v0.17.0
- vLLM main:
8b6325758c

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
2026-03-23 21:37:41 +08:00
Nengjun Ma
8e2c59e1ee Main2main upgrade vllm commit to 03 19 17:00 (#7478)
### What this PR does / why we need it?
Upgrade vllm commit to 2026.03.19.

1.Fix socket removed from StatelessProcessGroup. Upstream vLLM PR
[#36330](https://github.com/vllm-project/vllm/pull/36330) ("elastic_ep:
Fix stateless group port races") refactored StatelessProcessGroup and
removed the socket: socket.socket | None field. The socket ownership was
moved to a new create_tcp_store() helper instead of being stored as a
field on the dataclass.

2.fix `virtual_engine` parameter removed from `set_forward_context().
Upstream [V0 Deprecation] Deprecate virtual engine
[#37195](https://github.com/vllm-project/vllm/pull/37195)

### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?
NA

- vLLM version: v0.17.0
- vLLM main:
8b6325758c

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-03-23 16:25:57 +08:00
Nengjun Ma
ee804ce23e Main2main upgrade vllm to 0318 commit (#7412)
### What this PR does / why we need it?
Upgrade vllm commit to 0318. 

Main content: Added a pre-operation for cleaning up and waiting(default
max 50s) for the completion of the clean up of the NPU memory to some
test cases that failed due to the failure to release the NPU memory in a
timely manner when the previous test cases were executed.

### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?
NA

- vLLM version: v0.17.0
- vLLM main:
4497431df6

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-03-19 17:17:36 +08:00
Nengjun Ma
8b79d4de52 Main2main upgrade to vllm 0317 afternoon (#7409)
### What this PR does / why we need it?

1.fix "TypeError: get_attn_backend() remove variable": [Refactor
`check_and_update_config`](https://github.com/vllm-project/vllm/pull/35122)

2.fix [Rename `compile_ranges_split_points` to
`compile_ranges_endpoints`](https://github.com/vllm-project/vllm/pull/36027)

3.fix "RuntimeError: device_allocator not a DeviceAllocator":[Replace
memory related torch.cuda
APIs"](https://github.com/vllm-project/vllm/pull/37031)

4.fix [Support multiple KV groups in OffloadingSpec
](https://github.com/vllm-project/vllm/pull/36610) removed
self.offloaded_block_size and changed self.gpu_block_size from a scalar
to a tuple of per-group block sizes, adding block_size_factor.

5.fix [Consolidate
SupportsEagle](https://github.com/vllm-project/vllm/pull/36063) renamed
get_eagle3_aux_hidden_state_layers() to
get_eagle3_default_aux_hidden_state_layers() and added a
supports_eagle3() guard before calling it.

### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
E2E


- vLLM version: v0.17.0
- vLLM main:
8a680463fa

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
2026-03-18 23:24:27 +08:00
zhangyiming
1c954ff264 [main2main] upgrade vllm to 0308 (#7213)
### What this PR does / why we need it?
Update main2main to vllm 0308.
breaks:

* https://github.com/vllm-project/vllm/pull/30681
* https://github.com/vllm-project/vllm/pull/35552 remove
self.cudagraph_batch_sizes
* https://github.com/vllm-project/vllm/pull/35158 clear_metadata ->
defer_finalize
* https://github.com/vllm-project/vllm/pull/36006 remove
CacheConfig.cpu_offload_gb
* https://github.com/vllm-project/vllm/pull/35472
* https://github.com/vllm-project/vllm/pull/34552 attn_metadata_builder
* https://github.com/vllm-project/vllm/pull/30515 profile_seq_lens
* https://github.com/vllm-project/vllm/pull/28053 

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: menogrey <1299267905@qq.com>
Co-authored-by: MrZ20 <2609716663@qq.com>
2026-03-18 09:24:43 +08:00
SILONG ZENG
bd571cf6d6 [Main2Main] Upgrade vLLM to 0303 (#6944)
### What this PR does / why we need it?
break:
- https://github.com/vllm-project/vllm/pull/34102 
Disable_full param replaced with valid_modes/invalid_modes API
- https://github.com/vllm-project/vllm/pull/35503
Now must return float compilation_time
- https://github.com/vllm-project/vllm/pull/35564
New sequence_lengths param added
- https://github.com/vllm-project/vllm/pull/33807
A check was performed (if runner_backend != "auto")
- https://github.com/vllm-project/vllm/pull/34861
`BaseDeviceCommunicator` now accesses PyTorch's internal `pg_map` to
check process group state
- https://github.com/vllm-project/vllm/pull/35274

**Important change:**
- https://github.com/vllm-project/vllm/pull/28672

`matcher_utils` directly accesses `torch.ops._C.*` during the import
phase. In the Ascend environment, some unregistered ops trigger
`AttributeError`, causing e2e initialization failure.

https://github.com/vllm-project/vllm-ascend/actions/runs/22607260487/job/65502047131#step:10:2323

https://github.com/vllm-project/vllm/blob/main/vllm/compilation/passes/fusion/matcher_utils.py#L29

This PR adds temporary compatibility placeholders (rms_norm,
fused_add_rms_norm, rotate_embedding, static/dynamic fp8 quant,
silu_and_mul) to
`vllm_ascend/patch/platform/patch_fusion_matcher_compat_ops.py` to
ensure no crashes during the import phase. Upstream repairs will be
considered later.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: Meihan-chen <jcccx.cmh@gmail.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
2026-03-06 09:08:52 +08:00
Canlin Guo
e4458b2d2b [Main2Main] Upgrade vLLM to 0226 (#6813)
### What this PR does / why we need it?

Breaking:
1. https://github.com/vllm-project/vllm/pull/33452
2. https://github.com/vllm-project/vllm/pull/33451
3. https://github.com/vllm-project/vllm/pull/32567
4. https://github.com/vllm-project/vllm/pull/32344

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: MrZ20 <2609716663@qq.com>
2026-02-27 16:05:21 +08:00
Icey
ee59429015 upgrade main to 0212 (#6712)
### What this PR does / why we need it?
Fixes `transformers_utils/processors/__init__` import error, due to
https://github.com/vllm-project/vllm/pull/33247
Fixes Fused MoE break introduced by `MoERunner abstraction,` due to
https://github.com/vllm-project/vllm/pull/32344

> delete AscendMoERunnere when
https://github.com/vllm-project/vllm/pull/35178 is merged

Fixes `Make Qwen3VL compatible with Transformers v5`, due to
https://github.com/vllm-project/vllm/pull/34262

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
2026-02-25 09:17:29 +08:00
Icey
88773bb101 [main to main] upgrade main 0210 (#6673)
### What this PR does / why we need it?
upgrade vllm commit to `9562912cead1f11e8540fb91306c5cbda66f0007`

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
all tests passed

- vLLM version: v0.15.0
- vLLM main:
13397841ab

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
2026-02-11 18:10:14 +08:00
wangxiyuan
2a826b5fad [Misc] upgrade to vllm main (#6646)
### What this PR does / why we need it?
This PR upgrades the core vLLM dependency to a newer version from the
main branch (`13397841ab469cecf1ed425c3f52a9ffc38139b5`). This is
necessary to keep our project up-to-date with the latest features and
fixes from upstream vLLM.

1.
ac32e66cf9
pass file is moved.

- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Co-authored-by: wxsIcey <1790571317@qq.com>
2026-02-10 14:08:59 +08:00
meihanc
922e5c163b [main2main] upgrade vllm main 0202 (#6560)
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
https://github.com/vllm-project/vllm/pull/32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to https://github.com/vllm-project/vllm/pull/33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
https://github.com/vllm-project/vllm/pull/33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
https://github.com/vllm-project/vllm/pull/32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
https://github.com/vllm-project/vllm/pull/32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to https://github.com/vllm-project/vllm/pull/27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
https://github.com/vllm-project/vllm/pull/33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
https://github.com/vllm-project/vllm/pull/32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-02-05 19:31:17 +08:00
wangxiyuan
eeedf7c503 [Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470)
### What this PR does / why we need it?
This PR upgrades the vLLM dependency from `v0.14.1` to `v0.15.0`. This
involves:
- Updating the `VLLM_TAG` in all `Dockerfile`.
- Updating the vLLM version in `docs/source/conf.py`.
- Removing conditional code paths specific to `v0.14.1` across the
codebase, which simplifies maintenance.
- Fix `TypeError: MMEncoderAttention.__init__() got an unexpected
keyword argument 'multimodal_config'` due to
https://github.com/vllm-project/vllm/pull/31972.
- Fix `_shared_experts: 'NoneType' object is not callable` due to
https://github.com/vllm-project/vllm/pull/32082 by
https://github.com/vllm-project/vllm-ascend/pull/6335.
- Fix `ReshapeAndCacheOperation setup failed!` due to
https://github.com/vllm-project/vllm/pull/25954 by overriding attention
metadata slots.

This upgrade is necessary to keep the project aligned with the latest
features, bug fixes, and API changes in the vLLM project.

### Does this PR introduce _any_ user-facing change?
No, this is an internal dependency update and does not introduce any
user-facing changes.

### How was this patch tested?
CI is expected to pass with these changes, ensuring that all existing
tests are successful with the new vLLM version.

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8


co-authored-by: shen-shanshan <467638484@qq.com>

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-02 15:57:55 +08:00
meihanc
fea197ad50 [Main2Main] Upgrade vllm commit to 0123 (#6169)
### What this PR does / why we need it?
1.  Upgrade vllm commit to: 0115
(8471b27df97c3eb79f891802fc0e858f8f7ac6a0)
Modify import paths due to the refactors:
https://github.com/vllm-project/vllm/pull/32245
https://github.com/vllm-project/vllm/pull/32060
Test result:
https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913
2. Upgrade vllm commit to: 0119
(9a1f16da1e423ede2c2f52a9850cbfbb39cefe96)
Fix `WorkerProc.__init__() missing 1 required positional argument:
'is_driver_worker'` due to
https://github.com/vllm-project/vllm/pull/28506
Test result:
https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569
3. Upgrade vllm commit to:
0120(148117ea2e689cd43df4be6892671a17cdae5833)
1. Add `skip_compiled` param in `set_forward_context` due to
https://github.com/vllm-project/vllm/pull/30385
2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to
https://github.com/vllm-project/vllm/pull/24322
change `self.max_num_tokens =
vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size`
3. Modify UT import paths due to the
refactors:https://github.com/vllm-project/vllm/pull/32060
Test result:
https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946
4. Upgrade vllm commit to:
0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9)
1. vLLM switched `uses_mrope` from target to draft model config, making
`positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's
direct self.positions access and tests missing
`draft_model_config.uses_mrope`.
https://github.com/vllm-project/vllm/pull/32048
2. Moved bs_to_padded_graph_size from CompilationConfig to
CudagraphDispatcher due to the refactor
https://github.com/vllm-project/vllm/pull/30143
3. Remove unused `maybe_setup_kv_connector` due to
https://github.com/vllm-project/vllm/pull/32077
Test result:
https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834
6. Upgrade vllm commit to:
0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5)
Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig
due to https://github.com/vllm-project/vllm/pull/32414
Test result:
https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054
8. Upgrade vllm commit to:
0123(dc917cceb877dfd13f98c538c4c96158047d98bd)
Setting temperature=0.0 due to the removal of the default temperature
value in https://github.com/vllm-project/vllm/pull/32723
Test result:
https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.0
- vLLM main:
d68209402d

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Co-authored-by: wjunLu <wjunlu217@gmail.com>
2026-01-27 08:44:36 +08:00
dependabot[bot]
ae71c4237e [CI] Bump actions/setup-python from 6.1.0 to 6.2.0 (#6256)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 6.1.0 to 6.2.0.

- vLLM version: v0.14.1
- vLLM main:
d68209402d

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-26 14:20:14 +08:00
meihanc
ea57e3e7a4 [Main2Main] Upgrade vllm commit to releases/v0.14.0 (#5988)
### What this PR does / why we need it?
Upgrade vllm commit to releases/v0.14.0

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2026-01-20 15:10:40 +08:00
wjunLu
73a3f822c7 [Main2Main] Upgrade vllm commit to releases/v0.14.0 (#5911)
### What this PR does / why we need it?
Upgrade vllm commit to releases/v0.14.0

- Re-open cases in `tests/e2e/singlecard/pooling/test_scoring.py`, since
the errors before have been fixed by
https://github.com/vllm-project/vllm/pull/32243
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
11b6af5280

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-01-15 23:22:43 +08:00
wjunLu
c11a05c4e1 [Main2Main] Upgrade vllm commit to 0113 (#5839)
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
https://github.com/vllm-project/vllm/pull/31916
https://github.com/vllm-project/vllm/pull/32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
https://github.com/vllm-project/vllm/pull/24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
https://github.com/vllm-project/vllm/pull/31998

- Skip some pooling tests, which are caused by
https://github.com/vllm-project/vllm/pull/32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
https://github.com/vllm-project/vllm/pull/32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
https://github.com/vllm-project/vllm/pull/32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-01-15 09:48:53 +08:00
zhangxinyuehfad
f7b904641e [Main2Main] Upgrade vllm commit to 0109 (#5752)
### What this PR does / why we need it?
Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df)

1. remove `init_cached_hf_modules ` due to
https://github.com/vllm-project/vllm/pull/31786
2. fix spec_decode e2e test due to
https://github.com/vllm-project/vllm/pull/29821 break
3. fix `vllm.v1.attention.backends.utils` duo to
https://github.com/vllm-project/vllm/pull/31891
4. fix `self.seq_lens - query_lens` on same device due to
https://github.com/vllm-project/vllm/pull/31773
5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has
no attribute 'get_cuda_view_from_cpu_tensor'`

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-01-13 19:14:43 +08:00
dependabot[bot]
86c4bea116 Bump actions/checkout from 4 to 6 (#5795)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 20:44:23 +08:00
wangxiyuan
354ee3b330 [Doc] Update doc url link (#5781)
Drop `dev` suffix for doc url.
Rename url to `https://docs.vllm.ai/projects/ascend`

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-12 11:21:31 +08:00
wangxiyuan
264cc254cc [CI] fix image build tag (#5703)
ref doesn't work with workflow_dispatch, let's change it to raw way

This PR also merge the pr_create job into one runner to save resource.
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-08 09:27:45 +08:00
wjunLu
b8f245792e [Main2Main] Upgrade vllm commit to 0106 (#5617)
### What this PR does / why we need it?
Upgrade vllm commit to 0106

- vLLM version: v0.13.0
- vLLM main:
8be6432bda

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-01-06 15:50:40 +08:00
wjunLu
3cf059a72b [Main2Main] Upgrade vllm commit to 0105 (#5595)
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
https://github.com/vllm-project/vllm/pull/31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
https://github.com/vllm-project/vllm/pull/30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
https://github.com/vllm-project/vllm/pull/31584

4. Adapt `self.block_size` calling due to
https://github.com/vllm-project/vllm/pull/31540

5. Modify `test_mla_v1.py` due to
https://github.com/vllm-project/vllm/pull/28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-01-06 08:44:29 +08:00
wjunLu
3c2d3e52e5 [Main2Main] Upgrade vllm commit to 1230 (#5495)
### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by https://github.com/vllm-project/vllm/pull/27614 (and the
core PR https://github.com/vllm-project/vllm/pull/26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(https://github.com/vllm-project/vllm-ascend/issues/5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2025-12-31 09:44:35 +08:00
Nengjun Ma
5e96f94d2a Update corresponding vllm commit ID to 12 29 (#5475)
### What this PR does / why we need it?
- Fixes vllm break:
1. [[BugFix] register quant scale tensors as buffer #31395]
(https://github.com/vllm-project/vllm/pull/31395)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
5326c89803

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-12-29 22:48:05 +08:00
ZT-AIA
24328aaf00 update vllm pin to 12.27 (#5412)
### What this PR does / why we need it?
update vllm pin to 12.27
1、Fix Qwen2-MoE shared_expert_gate
:https://github.com/vllm-project/vllm/pull/31339
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
vLLM version: release/v0.13.0
vLLM main:
5326c89803
Co-authored-by: leo-pony [nengjunma@outlook.com](nengjunma@outlook.com)

---------

Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
2025-12-28 00:19:36 +08:00
ZT-AIA
1d8aa892bf Update vllm pin to 12.26 (#5378)
### What this PR does / why we need it?
Update vllm pin to 12.26
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

---------

Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-26 23:44:48 +08:00
ZT-AIA
adaa89a7a5 Update vllm pin to 12.25 (#5342)
### What this PR does / why we need it?
- Fix vllm break in the pr:
1.[Drop v0.14 deprecations
]https://github.com/vllm-project/vllm/pull/31285
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: ZT-AIA <1028681969@qq.com>
2025-12-26 14:05:40 +08:00
Nengjun Ma
42c989a437 Update vllm pin to 12.24 (#5307)
### What this PR does / why we need it?
Fix vllm break in the pr:
1. [Add MiMo-V2-Flash support]
(https://github.com/vllm-project/vllm/pull/30836)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com)

- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
2025-12-24 17:24:31 +08:00
Nengjun Ma
3b59f20a28 update to vllm 12-19 (#5223)
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
Fix vllm break:
1. [Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4%
TTFT improvement] (https://github.com/vllm-project/vllm/pull/29558)
Fix Solution: Add the now-necessary `all2all_backend` parameter. The
impact of this parameter on the original `set_splitting_ops_for_v1`
implementation is only that graph mode is disabled in `vllm` if
`deepep_high_throughput` is enabled; it has no effect on the
`vllm-ascend` logic.

2.[Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention
interface ] (https://github.com/vllm-project/vllm/pull/30684)
Fix Solution: The reason why the GPU does not need to convert qkv to 3D
is that the GPU's flash_attention operator is compatible with 3D and 4D
(b s h d and s b ( h d)), but the NPU's flash_attention_unpad operator
only supports 3D (s b ( h d)). Therefore, we need to introduce the
reshape_qkv_to_3d operation.

4.Skip Tencent-Hunyuan/HunyuanOCR test case, as it has following issue
in upgrade vllm code:
https://github.com/vllm-project/vllm-ascend/issues/5297

### How was this patch tested?


Co-authored-by: zxwang <1476209578@qq.com>

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
2025-12-23 23:52:11 +08:00
wangxiyuan
636265be6d [CI] Improve CI (#5078)
Raname workflow to be clear.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-19 15:34:35 +08:00
wangxiyuan
8090914d69 [CI] CI refactor (#4928)
1. rename workflow to better name
2. fix lint error
3. remove accuracy report doc and test

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-14 11:09:56 +08:00