[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841)

### What this PR does / why we need it? We'll refator `CustomOp` in vllm-ascend from this pr on. Use function `CustomOp.register_oot` to achieve the customop registery, taking `AscendQuickGELU` as an example: ```python from vllm_ascend.ops.activation import AscendQuickGELU CustomOp.register_oot(_decorated_op_cls=AscendQuickGELU, name="QuickGELU") ``` This is a quick adapt for `CustomOp.register_oot` mechanism from vllm 0.9.2. For further step, we can remove inherit from `QuickGELU` can write our own `QuickGELU` at all. Part of https://github.com/vllm-project/vllm-ascend/pull/1647 - vLLM version: v0.9.2 - vLLM main: 8dfb45ca33 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-18 23:07:14 +08:00
parent 8a91e6e59c
commit 574fe407eb
8 changed files with 154 additions and 22 deletions
--- a/vllm_ascend/ops/activation.py
+++ b/vllm_ascend/ops/activation.py
@@ -18,25 +18,25 @@
 import torch
 from vllm.model_executor.layers.activation import QuickGELU, SiluAndMul

-from vllm_ascend.utils import is_310p
+
+class AscendQuickGELU(QuickGELU):
+
+    def forward_oot(self, x: torch.tensor) -> torch.Tensor:
+        import torch_npu
+
+        out = torch_npu.npu_fast_gelu(x)
+        return out


-def silu_and_mul_forward_oot(self, x: torch.Tensor) -> torch.Tensor:
-    import torch_npu
+class AscendSiluAndMul(SiluAndMul):

-    if is_310p():
-        out = torch_npu.npu_swiglu(x.to(torch.float32)).to(torch.float16)
-    else:
-        out = torch_npu.npu_swiglu(x)
-    return out
+    def forward_oot(self, x: torch.Tensor) -> torch.Tensor:
+        import torch_npu

+        from vllm_ascend.utils import is_310p

-def quick_gelu_forward_oot(self, x: torch.tensor) -> torch.Tensor:
-    import torch_npu
-
-    out = torch_npu.npu_fast_gelu(x)
-    return out
-
-
-QuickGELU.forward_oot = quick_gelu_forward_oot
-SiluAndMul.forward_oot = silu_and_mul_forward_oot
+        if is_310p():
+            out = torch_npu.npu_swiglu(x.to(torch.float32)).to(torch.float16)
+        else:
+            out = torch_npu.npu_swiglu(x)
+        return out