Refactor AscendMultiHeadLatentAttention (#2826)
### What this PR does / why we need it?
Register AscendMultiHeadLatentAttention as CustomOP, following vllm changes
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed with new added/existing test.
- vLLM version: main
- vLLM main:
b23fb78623
---------
Signed-off-by: Icey <1790571317@qq.com>
This commit is contained in:
@@ -529,6 +529,10 @@ def register_ascend_customop():
|
||||
from vllm_ascend.ops.common_fused_moe import AscendFusedMoE
|
||||
CustomOp.register_oot(_decorated_op_cls=AscendFusedMoE, name="FusedMoE")
|
||||
|
||||
from vllm_ascend.models.layers.mla import AscendMultiHeadLatentAttention
|
||||
CustomOp.register_oot(_decorated_op_cls=AscendMultiHeadLatentAttention,
|
||||
name="MultiHeadLatentAttention")
|
||||
|
||||
# NOTE: Keep this at last to ensure all custom actions are registered
|
||||
_ASCEND_CUSTOMOP_IS_REIGISTERED = True
|
||||
|
||||
|
||||
Reference in New Issue
Block a user