Files
xc-llm-ascend/vllm_ascend
wangbj127 f2956ce944 [v0.18.0][BugFix] Fix dimension mismatch error when SP padding causes num_tokens_padded != num_tokens_unpadded (#8133)
Cherry-picked from https://github.com/vllm-project/vllm-ascend/pull/7858

### What this PR does / why we need it?
This PR fixes a `RuntimeError` (dimension mismatch) that occurs when
Sequence Parallelism (SP) is enabled and the padding added for SP causes
`num_tokens_padded` to differ from `num_tokens_unpadded`. In such cases,
`_pad_query_start_loc_for_fia` adds a dummy request, increasing
`num_reqs_padded`. This mismatch between the actual number of requests
and the padded number of requests leads to errors in downstream token
count computations (e.g., `compute_num_computed_tokens`).

The fix modifies the restrictive condition `num_tokens_padded ==
num_tokens_unpadded` when reverting the dummy request padding if SP is
enabled, as SP padding is handled by stripping it after communication
and should not be treated as an additional request in the attention
metadata.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
vLLM version: v0.18.0
vLLM-Ascend version: releases/v0.18.0

Signed-off-by: Wangbj127 <wangbj1207@126.com>
2026-04-17 22:50:22 +08:00
..
2026-03-21 16:05:38 +08:00
2026-03-19 14:27:27 +08:00