The bmm_transpose operator in version 3.2 is only used in the decoding stage due to shape limitations. - vLLM version: v0.12.0 - vLLM main: ad32e3e19c --------- Signed-off-by: ChrisGelhLan <33011886+xlan-huawei@users.noreply.github.com>
ad32e3e19c
dispatch_gmm_combine