xc-llm-ascend/vllm_ascend at f4605c2b3cb09dbc97887261a239b8e5478871b9 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Yizhou f4605c2b3c [Fix] Fixes speculative decode indexing and unpad condition for attention metadata (#5626 )

### What this PR does / why we need it?
This addresses the issue brought up by #5356 and #4963, and we believe
the unnecessary conditions are the root cause.

Change the unpad trigger to be driven by actual size mismatches
(num_reqs vs base_num_reqs or scheduled vs input token counts) rather
than specific speculative-method flags. Then remove brittle workarounds
that forced request counts and sliced query start locations.

This prevents incorrect indexing and length mismatches during
speculative decoding and makes metadata unpadding more robust across
scheduling modes.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
Tested by existing cases.

- vLLM version: v0.13.0
- vLLM main:
8be6432bda

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2026-01-08 19:41:08 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Fix] Fixes speculative decode indexing and unpad condition for attention metadata (#5626 )

2026-01-08 19:41:08 +08:00

[BugFix][Fusion] Fix graph fusion failure problem (#5676 )

2026-01-07 18:42:55 +08:00

[CI] fix lint (#5216 )

2025-12-20 17:03:25 +08:00

device_allocator

[Refactor] Cleanup platform (#5566 )

2026-01-07 09:25:55 +08:00

[bugfix] adapt to new implemented get_kv_cache_spec in cpuoffload connector (#4311 )

2026-01-08 09:15:09 +08:00

[Bugfix] Revert pr4214 multi-stream collect expert hotpot (#5529 )

2026-01-07 11:26:47 +08:00

[BugFix] Fix npu-cpu offloading interface change bug. (#5290 )

2025-12-27 10:21:20 +08:00

[BugFix]Fix precision issue for LoRA feature (#4141 )

2025-12-19 14:22:06 +08:00

[CI] speed up ut (#4901 )

2025-12-11 18:45:43 +08:00

[Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

2026-01-08 09:15:55 +08:00

[BugFix][Fusion] Fix graph fusion failure problem (#5676 )

2026-01-07 18:42:55 +08:00

[Feature]EPLB:Adapt DispatchGmmCombineDecode operator to eplb tensor list and expert token numbers (#5552 )

2026-01-07 11:23:42 +08:00

[Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

2026-01-08 09:15:55 +08:00

[Fix] Fixes speculative decode indexing and unpad condition for attention metadata (#5626 )

2026-01-08 19:41:08 +08:00

[Fix] Fixes speculative decode indexing and unpad condition for attention metadata (#5626 )

2026-01-08 19:41:08 +08:00

[Bugfix] fix dcp_only bug and add e2e accuracy test for dcp only and pcp only (#5565 )

2026-01-06 22:48:21 +08:00

__init__.py

clean up model module (#4611 )

2025-12-02 17:35:47 +08:00

ascend_config.py

[refactor] Refactor the interface for shard weight and remove the flashcomm2 o_shared interface. (#5181 )

2026-01-08 09:05:02 +08:00

ascend_forward_context.py

[Feature]EPLB:Adapt DispatchGmmCombineDecode operator to eplb tensor list and expert token numbers (#5552 )

2026-01-07 11:23:42 +08:00

batch_invariant.py

[Feature] implement basic framework for batch invariant (#5517 )

2026-01-07 09:11:26 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[refactor] Refactor the interface for shard weight and remove the flashcomm2 o_shared interface. (#5181 )

2026-01-08 09:05:02 +08:00

flash_common3_context.py

[Perf]enable prefill flashcommon3 (#4065 )

2025-12-14 09:34:13 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Optimize the print info format when deprecated code is used in vllm-ascend (#5696 )

2026-01-08 09:26:49 +08:00

profiling_config.py

Drop ascend scheduler (#4623 )

2025-12-05 09:03:45 +08:00

utils.py

[refactor] Refactor the interface for shard weight and remove the flashcomm2 o_shared interface. (#5181 )

2026-01-08 09:05:02 +08:00