[v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632)

### What this PR does / why we need it? There is a zero-like operator before the attention operation in each decoding stage. After analysis, this operator can be eliminated. The purpose of this PR is to remove this operator and improve performance. --------- Signed-off-by: ZYang6263 <zy626375@gmail.com>
2025-10-23 14:49:28 +08:00
parent 74903af460
commit 6975d46627
9 changed files with 111 additions and 6 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -160,3 +160,15 @@
 #    Future Plan:
 #       Remove this patch when adapted vllm version contains the above PR.
 #
+# ** File: worker/patch_attention_layer.py **
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#   1. `vllm.attention.layer.Attention.forward`
+#    Why:
+#       There is a zerolike operator before the attention operation in each decoding stage.
+#    How
+#       Replace this zerolike operator with torch.empty
+#    Related PR (if no, explain why):
+#       - https://github.com/vllm-project/vllm/pull/26680
+#    Future Plan:
+#       Remove this to match the optimization supported in the VLLM version.
+#