Trunrain
|
ba9cda9dfd
|
[Kernel] add custom op MatmulAllreduceAddRmsnorm (#4606)
What this PR does / why we need it?
Optimization of the fused operator for Qwen3 32B: Matmul, AllReduce,
Add, and RMSNorm
Does this PR introduce _any_ user-facing change?
No
How was this patch tested?
vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
Signed-off-by: tongrunze <t00574058@china.huawei.com>
Co-authored-by: tongrunze <t00574058@china.huawei.com>
|
2025-12-10 09:05:33 +08:00 |
|