[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (#1416)

### What this PR does / why we need it?
Reset all unused positions in `NPUModelRunner` to prevent out-of-bounds
asserts in the `GatherV3` operator.

Currently, in
[`get_splitfuse_attn_mask`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/attention/attention.py#L124),
the `position` tensor may contain values that exceed the dimensions of
the attention mask, triggering a `GatherV3` boundary check failure.
These invalid indices originate from stale “dirty” entries left over in
`position` due to padding logic in the ACL graph. Specifically, in
[`_process_reqs`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L989),
the variable `num_input_tokens` is always greater than or equal to
`total_num_scheduled_tokens`, so any positions not explicitly cleared
from a previous batch will persist and cause this sporadic error.

BTW, in the original vLLM implementation, masks are constructed
internally using other args, so these lingering values do not surface.
However, on the Ascend platform—where split-fuse attention requires
externally supplied masks—these residual indices become critical and
lead to this elusive, hard-to-reproduce failure.

The fix is to explicitly reset or zero out all unused entries in the
`position` tensor before passing it to `GatherV3`, ensuring that every
index lies within the valid range of the attention mask.

Closes: https://github.com/vllm-project/vllm-ascend/issues/1038

### Does this PR introduce _any_ user-facing change?
No


Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

This commit is contained in:

yiz-liu

2025-06-26 09:27:43 +08:00

committed by

GitHub

parent 06ccce1ddf

commit 2690697caa

1 changed files with 1 additions and 0 deletions

									
										1

vllm_ascend/worker/model_runner_v1.py
									
												View File
												
				@@ -953,6 +953,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):

				                self.mrope_positions_cpu[:, :total_num_scheduled_tokens],

				                non_blocking=True)

				        self.positions[total_num_scheduled_tokens:num_input_tokens].zero_()

				        self.positions[:total_num_scheduled_tokens].copy_(

				            self.positions_cpu[:total_num_scheduled_tokens], non_blocking=True)

				        positions = self.positions[:num_input_tokens]

[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (#1416)

1 vllm_ascend/worker/model_runner_v1.py Unescape Escape View File

1

vllm_ascend/worker/model_runner_v1.py

View File