[v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632)
### What this PR does / why we need it? There is a zero-like operator before the attention operation in each decoding stage. After analysis, this operator can be eliminated. The purpose of this PR is to remove this operator and improve performance. --------- Signed-off-by: ZYang6263 <zy626375@gmail.com>
This commit is contained in:
@@ -28,3 +28,4 @@ import vllm_ascend.patch.worker.patch_weight_loader # noqa
|
||||
import vllm_ascend.patch.worker.patch_multimodal_merge # noqa
|
||||
import vllm_ascend.patch.worker.patch_minicpm # noqa
|
||||
import vllm_ascend.patch.worker.patch_deepseek_mtp # noqa
|
||||
import vllm_ascend.patch.worker.patch_attention_layer # noqa
|
||||
Reference in New Issue
Block a user