xc-llm-ascend/vllm_ascend at 77ea8732241bfd80160824b17d1aee75909e1c24 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Yizhou 56f5d3bd49 [Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6357 )

### What this PR does / why we need it?
This handles both uniform and mixed batches (by inserting a dummy
request for mixed batches), consolidates ad-hoc padding into a single
helper, copies the updated buffer to the device, and asserts the layout
constraint before building the attention metadata. Together, these
changes prevent kernel mismatches or failures and ensure correct shapes
for FIA/TND execution in full graph modes.

We currently place this helper in `execute_model`. My original design
was to include it in `_prepare_inputs`, but that doesn’t work because it
must run after padding. While I’d prefer to minimize the impact and
reuse as much of the base class as possible in the future, it doesn’t
seem achievable at the moment.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
Test cases added.

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2026-01-30 16:41:44 +08:00

..

[Fixbugs]: fix refactor cause to 310p chunkprefill error (#6340 )

2026-01-28 16:41:32 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[bugfix](pcp,gqa) set kv_inverse_idx_for_chunk and cp_kv_recover_idx_for_chunk to None when dcp only (#6317 )

2026-01-29 19:35:52 +08:00

[e2e Test][npugraph_ex]add static kernel e2e test case (#6320 )

2026-01-30 16:24:48 +08:00

[0.14.1][bugfix][sched] fix incompatibility of RecomputeScheduler with vllm v0.14.1 (#6286 )

2026-01-28 20:16:58 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[P/D] Using the cache load operator to replace the index select operator. (#6295 )

2026-01-30 14:27:53 +08:00

[EPLB][Bugfix] EPLB support fp/bf16 (#5531 )

2026-01-26 14:28:16 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[Main2Main][BugFix] Add shared_experts check for AscendSharedFusedMoE (#6335 )

2026-01-29 08:47:20 +08:00

[Misc] Drop deepseek patch (#6288 )

2026-01-29 14:45:50 +08:00

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

[ops] support advanced apply_top_k_top_p without top_k constraint (#6098 )

2026-01-26 09:08:42 +08:00

Qwen3-VL-MoE EAGLE support for vLLM-Ascend (#6327 )

2026-01-29 16:44:30 +08:00

[Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6357 )

2026-01-30 16:41:44 +08:00

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[BugFix] Disable enable_shared_expert_dp by default if tensor_parallel_size=1 (#6361 )

2026-01-28 22:01:01 +08:00

ascend_forward_context.py

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

batch_invariant.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

cpu_binding.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

envs.py

Default enable MLAPO (#5952 )

2026-01-22 09:26:39 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

platform.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00

profiling_config.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

utils.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00