Commit Graph

13 Commits

Author SHA1 Message Date
Baizhou Zhang
282eb59ff3 Add bf16 output option for dsv3_router_gemm kernel (#7999) 2025-07-20 09:49:37 +08:00
Baizhou Zhang
7248272ccc Add dsv3 router gemm kernel (#7627) 2025-06-29 23:31:55 -07:00
Ke Bao
04b35190e2 Add dsv3 fused a gemm to sgl-kernel (#7630) 2025-06-29 02:52:24 -07:00
fzyzcjy
5c66c4424f Support new DeepGEMM format in per token group quant (#7146) 2025-06-13 02:00:22 -07:00
Pavani Majety
eb38c7d1ca [1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-06-02 13:48:03 -07:00
HandH1998
4d643f6c7a [1/2] Support Qserve (#6457)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-05-21 19:48:59 -07:00
Yineng Zhang
136b8e6afb fix: remove cublas_grouped_gemm (#5307) 2025-04-11 16:22:37 -07:00
Yineng Zhang
31dfff7da7 use default for torch.ops (#4835) 2025-03-27 19:09:58 -07:00
Trevor Morris
e9f8e42318 Support FP4 gemm (1/2) (#3899) 2025-03-24 19:50:23 -07:00
Chunan Zeng
65c24c28f9 [Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396) 2025-03-23 23:44:17 -07:00
Yineng Zhang
0a3960f21f fix awq_dequantize (#4333) 2025-03-12 01:04:38 -07:00
Rex
07f944631e Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104) 2025-03-12 00:10:02 -07:00
Lianmin Zheng
8abf74e3c9 Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-03-08 22:54:51 -08:00