[Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (#6469)

### What this PR does / why we need it?
Fix the incorrect use of the `output` parameter in
`_forward_fia_slidingwindow`:
```
# Original (incorrect)
output, _ = torch_npu.npu_fused_infer_attention_score(...)
output= output.view(batch_size, self.num_heads, self.head_size)
```

In the original writing, the `output `parameter was directly assigned a
new value, which is inconsistent with the interface definition,
resulting in the inability to directly update `output `when calling
externally.

```
attn_output, _ = torch_npu.npu_fused_infer_attention_score(...)
attn_output = attn_output.view(batch_size, self.num_heads, self.head_size)
output[:batch_size] = attn_output[:batch_size]
```

### Does this PR introduce _any_ user-facing change?
No change.

Co-authored-by: GoCHug<gch59135228@163.com>

### How was this patch tested?
vLLM ascend version: v0.13.0rc1

Signed-off-by: acat-rw <892882856@qq.com>
This commit is contained in:
Ruowei Zheng
2026-02-05 20:58:54 +08:00
committed by GitHub
parent 922e5c163b
commit 8e66299bf1

View File

@@ -727,7 +727,7 @@ class AscendAttentionBackendImpl(AttentionImpl):
key = self.key_cache.flatten(2, 3).contiguous()
value = self.value_cache.flatten(2, 3).contiguous()
output, _ = torch_npu.npu_fused_infer_attention_score(
attn_output, _ = torch_npu.npu_fused_infer_attention_score(
query,
key,
value,
@@ -742,7 +742,8 @@ class AscendAttentionBackendImpl(AttentionImpl):
actual_seq_lengths_kv=attn_metadata.seq_lens,
)
output = output.view(batch_size, self.num_heads, self.head_size)
attn_output = attn_output.view(batch_size, self.num_heads, self.head_size)
output[:batch_size] = attn_output[:batch_size]
return output
def forward_fused_infer_attention(