xc-llm-ascend/math_utils.h at 2ee17e50a1525abc0d47559571cbae5d41b662b3 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

Wang Yixuan c68ddc11ce [OPS] add bmm_transpose ops (#3990 )

### What this PR does / why we need it?
Add a new fusion ops to custom_op, which can cobime the torch.bmm() and
transpsose to achieve better peformance. This ops is used in mla_v1 to
replace the bmm and transpose

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.11.2

---------

Signed-off-by: hust17yixuan <303660421@qq.com>

2025-12-01 09:09:51 +08:00

16 lines

301 B

C++

Raw Blame History

 #ifndef KERNEL_MATH_UTILS_H
 #define KERNEL_MATH_UTILS_H
 #include <cstdint>
 namespace device_utils {
 template <typename T, T roundVal>
 __aicore__ __force_inline__ T RoundUp(const T &val)
 {
     return (val + roundVal - 1) / roundVal * roundVal;
 }
 };  // namespace device_utils
 #endif