[v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632)

### What this PR does / why we need it?
There is a zero-like operator before the attention operation in each
decoding stage. After analysis, this operator can be eliminated. The
purpose of this PR is to remove this operator and improve performance.

---------

Signed-off-by: ZYang6263 <zy626375@gmail.com>
This commit is contained in:
ZYang6263
2025-10-23 14:49:28 +08:00
committed by GitHub
parent 74903af460
commit 6975d46627
9 changed files with 111 additions and 6 deletions

View File

@@ -350,7 +350,7 @@ class AscendAttentionTorchairBackendImpl(AttentionImpl):
return output.view(num_tokens, self.hidden_size)
if attn_metadata is None:
return output.view(num_tokens, self.hidden_size)
return output.view(num_tokens, self.hidden_size).fill_(0)
output = output.view(-1, self.num_heads, self.head_size)