From e0c5073956eec0cfe46ceedb2f2bc2b6cc0da2f2 Mon Sep 17 00:00:00 2001
From: Wang Yixuan <88923622+hust17yixuan@users.noreply.github.com>
Date: Sat, 6 Dec 2025 10:52:46 +0800
Subject: [PATCH] [Bugfix]fix bmm_transpose ops for cann version (#4653)

### What this PR does / why we need it?
Due to the upgrade of CANN version, custom op cannot be used in high
version. In the high level cann version, the ops will start with
redundant vector core while this ops will only use cube core, this
results in the missalign when copy data from ub memory to global memory.
So add limitation to the ops to make it use cube core only.
### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.12.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9

---------

Signed-off-by: hust17yixuan <303660421@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
---
 .../op_kernel/batch_matmul_transpose_kernel.cpp                  | 1 +
 1 file changed, 1 insertion(+)

diff --git a/csrc/batch_matmul_transpose/op_kernel/batch_matmul_transpose_kernel.cpp b/csrc/batch_matmul_transpose/op_kernel/batch_matmul_transpose_kernel.cpp
index 81d987ba..33dba9a9 100644
--- a/csrc/batch_matmul_transpose/op_kernel/batch_matmul_transpose_kernel.cpp
+++ b/csrc/batch_matmul_transpose/op_kernel/batch_matmul_transpose_kernel.cpp
@@ -658,6 +658,7 @@ private:
 extern "C" __global__ __aicore__ void batch_matmul_transpose(GM_ADDR gm_a, GM_ADDR gm_b, GM_ADDR gm_c,
                                                              GM_ADDR gm_tiling_data)
 {
+    KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_AIC_ONLY);
     PpMatmulEinSum<0, false, false, half, half, DataFormat::ND>
         einsum_0_n_fp16_nd;  // swizzleDir[0] transA[0] transB[0] DtypeA[001] DtypeB[001] DtypeC[001] DataFormatA[0]
                              // DataFormatB[0]