xc-llm-ascend

Files

Zetong Li 054fde7b72 [0.18.0][BugFix] Fix attention state of short prompt for correct forwarding (#8088 )

### What this PR does / why we need it?
This PR is cherry-pick from #8029.

This PR aims to fix attention state of short prompt for correct
forwarding. Since a batch of short prompts (prefill tokens less than or
equal to num_spec_tokens + 1) will be treated as decode requests (by
split_decodes_and_prefills), its original PrefillNoCache attention state
contradicts. Thus these short prompts will be passed into a mismatched
branch and incur errors.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
by ci

Signed-off-by: Zetong Li <slippersss@126.com>

2026-04-09 21:21:24 +08:00

[v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686 )

2026-03-26 18:36:04 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103 )

2026-03-15 09:44:09 +08:00

model_runner_v1.py

[0.18.0][BugFix] Fix attention state of short prompt for correct forwarding (#8088 )

2026-04-09 21:21:24 +08:00