sglang

Author	SHA1	Message	Date
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Byron Hsu	514f37c32b	[kernel] Fix position ids in rope (#3173 )	2025-01-27 17:09:51 +08:00
Byron Hsu	fb11a43981	[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134 )	2025-01-27 15:28:00 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
Xiaoyu Zhang	5d9d15e70f	support fp32 in sampling_scaling_penalties kernel (#3121 )	2025-01-25 16:52:17 +08:00
Yineng Zhang	5de4051bcf	feat: integrate sampling kernels into sgl-kernel (#3086 ) Co-authored-by: Zihao Ye <expye@outlook.com>	2025-01-24 01:54:47 +08:00
Xiaoyu Zhang	e0cd65c2b6	[hotfix] fix test_sampling_scaling_penalties.py ci test (#3084 )	2025-01-24 00:33:59 +08:00
Xiaoyu Zhang	f1b6861828	use flashinfer vec_dtypes in sgl_kernel (#3083 )	2025-01-23 22:19:04 +08:00
Yineng Zhang	0da0989ad4	sync flashinfer and update sgl-kernel tests (#3081 )	2025-01-23 21:13:55 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	bf669606eb	feat: integrate bmm_fp8 kernel into sgl-kernel (#3056 )	2025-01-23 00:39:38 +08:00
Yineng Zhang	9d9b482a39	feat: integrate activation kernels into sgl-kernel (#3053 )	2025-01-22 23:25:45 +08:00
Yineng Zhang	7353fb9b97	feat: integrate norm kernels into sgl-kernel (#3052 )	2025-01-22 21:32:48 +08:00
Ke Bao	0ac019f171	Support sm90 Int8 gemm (#3035 )	2025-01-21 22:21:54 +08:00
Yineng Zhang	5a0d680a14	feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033 )	2025-01-21 20:44:49 +08:00
Byron Hsu	b5caa22dfb	[kernel] port rope cuda kernel to sgl-kernel (#2993 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-20 20:58:51 +08:00
yizhang2077	6cb3974e77	optimize custom allreduce kernel (#2904 )	2025-01-16 03:04:25 +08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Xiaoyu Zhang	e2b16c4716	add sampling_scaling_penalties kernel (#2846 )	2025-01-12 19:38:17 -08:00
Ke Bao	58f9060efe	Update int8 gemm config (#2774 )	2025-01-07 19:47:37 +08:00
Ke Bao	0f3eb1d294	Support cutlass Int8 gemm (#2752 )	2025-01-06 22:51:22 +08:00
yizhang2077	3900a94afe	Support twoshot kernel (#2688 )	2025-01-06 00:47:16 +08:00
Xiaoyu Zhang	ded9fcd09a	improve moe_align_kernel for deepseek v3 (#2735 )	2025-01-06 00:28:22 +08:00
HandH1998	77d1210b36	fix moe_align_block_size (#2615 )	2024-12-27 23:32:53 +08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
Yineng Zhang	31548116a8	fix moe_align_block_size_kernel for shared memory issue (#2579 ) Co-authored-by: ispobock <ispobaoke@163.com>	2024-12-26 05:31:04 +08:00
Yineng Zhang	e8dbdf75bc	fix typo (#2487 )	2024-12-15 13:44:55 +08:00
yizhang2077	e04d3f2897	adapt tensorrt llm custom all reduce to sgl-kernel (#2481 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-15 13:15:59 +08:00
Yineng Zhang	5c91a315d7	feat: support sgl-kernel pypi (#2302 )	2024-12-01 20:11:21 +08:00

30 Commits