Commit Graph

19 Commits

Author SHA1 Message Date
Yineng Zhang
a53454c55e fix: sgl-kernel link cuda (#2906) 2025-01-16 04:53:23 +08:00
yizhang2077
6cb3974e77 optimize custom allreduce kernel (#2904) 2025-01-16 03:04:25 +08:00
Xiaoyu Zhang
e2b16c4716 add sampling_scaling_penalties kernel (#2846) 2025-01-12 19:38:17 -08:00
Ke Bao
58f9060efe Update int8 gemm config (#2774) 2025-01-07 19:47:37 +08:00
Ke Bao
0f3eb1d294 Support cutlass Int8 gemm (#2752) 2025-01-06 22:51:22 +08:00
Ke Bao
06dd2eab84 Remove unused var in moe_align_kernel (#2751) 2025-01-06 22:13:28 +08:00
Ke Bao
439f65809f Fix sgl-kernel cu118 compile issue (#2750) 2025-01-06 21:59:31 +08:00
yizhang2077
3900a94afe Support twoshot kernel (#2688) 2025-01-06 00:47:16 +08:00
Xiaoyu Zhang
ded9fcd09a improve moe_align_kernel for deepseek v3 (#2735) 2025-01-06 00:28:22 +08:00
Yineng Zhang
b6b57fc200 minor: cleanup sgl-kernel (#2679) 2024-12-31 14:52:00 +08:00
Ke Bao
b02da24a5b Refactor sgl-kernel build (#2642) 2024-12-30 18:07:01 +08:00
HandH1998
77d1210b36 fix moe_align_block_size (#2615) 2024-12-27 23:32:53 +08:00
Yineng Zhang
2dccecf432 fix: only enable moe_align_block_size for now (#2590) 2024-12-26 16:56:59 +08:00
Yineng Zhang
31548116a8 fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
2024-12-26 05:31:04 +08:00
yizhang2077
e04d3f2897 adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-15 13:15:59 +08:00
Yineng Zhang
fccbfa3752 format: add clang-format for sgl-kernel (#2483) 2024-12-14 22:36:04 +08:00
Yineng Zhang
28bc60dcab misc: update build setup (#2306) 2024-12-02 02:03:49 +08:00
Yineng Zhang
47eb139f81 feat: use warp reduce as a simple example (#2304) 2024-12-01 22:43:50 +08:00
Yineng Zhang
5c91a315d7 feat: support sgl-kernel pypi (#2302) 2024-12-01 20:11:21 +08:00