sglang

Author	SHA1	Message	Date
Yineng Zhang	a53454c55e	fix: sgl-kernel link cuda (#2906 )	2025-01-16 04:53:23 +08:00
yizhang2077	6cb3974e77	optimize custom allreduce kernel (#2904 )	2025-01-16 03:04:25 +08:00
Xiaoyu Zhang	e2b16c4716	add sampling_scaling_penalties kernel (#2846 )	2025-01-12 19:38:17 -08:00
Ke Bao	58f9060efe	Update int8 gemm config (#2774 )	2025-01-07 19:47:37 +08:00
Ke Bao	0f3eb1d294	Support cutlass Int8 gemm (#2752 )	2025-01-06 22:51:22 +08:00
Ke Bao	06dd2eab84	Remove unused var in moe_align_kernel (#2751 )	2025-01-06 22:13:28 +08:00
Ke Bao	439f65809f	Fix sgl-kernel cu118 compile issue (#2750 )	2025-01-06 21:59:31 +08:00
yizhang2077	3900a94afe	Support twoshot kernel (#2688 )	2025-01-06 00:47:16 +08:00
Xiaoyu Zhang	ded9fcd09a	improve moe_align_kernel for deepseek v3 (#2735 )	2025-01-06 00:28:22 +08:00
Yineng Zhang	b6b57fc200	minor: cleanup sgl-kernel (#2679 )	2024-12-31 14:52:00 +08:00
Ke Bao	b02da24a5b	Refactor sgl-kernel build (#2642 )	2024-12-30 18:07:01 +08:00
HandH1998	77d1210b36	fix moe_align_block_size (#2615 )	2024-12-27 23:32:53 +08:00
Yineng Zhang	2dccecf432	fix: only enable moe_align_block_size for now (#2590 )	2024-12-26 16:56:59 +08:00
Yineng Zhang	31548116a8	fix moe_align_block_size_kernel for shared memory issue (#2579 ) Co-authored-by: ispobock <ispobaoke@163.com>	2024-12-26 05:31:04 +08:00
yizhang2077	e04d3f2897	adapt tensorrt llm custom all reduce to sgl-kernel (#2481 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-15 13:15:59 +08:00
Yineng Zhang	fccbfa3752	format: add clang-format for sgl-kernel (#2483 )	2024-12-14 22:36:04 +08:00
Yineng Zhang	28bc60dcab	misc: update build setup (#2306 )	2024-12-02 02:03:49 +08:00
Yineng Zhang	47eb139f81	feat: use warp reduce as a simple example (#2304 )	2024-12-01 22:43:50 +08:00
Yineng Zhang	5c91a315d7	feat: support sgl-kernel pypi (#2302 )	2024-12-01 20:11:21 +08:00

19 Commits