sglang

Author	SHA1	Message	Date
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Liu Jinjie	926f8efc0c	remove unused max_jobs (#3607 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-04 04:23:39 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	6b45a21d16	Reorganize c++ source files in sgl-kernel with multiple folders (#4025 )	2025-03-03 05:32:30 -08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Yineng Zhang	29daf498cd	fix cu118 link issue (#3421 )	2025-02-09 18:16:44 +08:00
Yineng Zhang	f9905d59a8	support speculative decoding kernel in sgl-kernel (#3373 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-07 20:29:51 +08:00
Yineng Zhang	00fa7d0417	add copyright for sgl-kernel (#3270 )	2025-02-03 21:34:44 +08:00
Yineng Zhang	3ee62235c6	revert the MoE dependence (#3230 )	2025-01-31 16:51:41 +08:00
Yineng Zhang	222ce6f1da	add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216 ) Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>	2025-01-30 23:04:41 +08:00
Yineng Zhang	468d23cff9	update setup for sgl-kernel (#3214 )	2025-01-30 19:47:50 +08:00
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Byron Hsu	fb11a43981	[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134 )	2025-01-27 15:28:00 +08:00
Yineng Zhang	f265d15b96	use self-hosted to build sgl-kernel (#3154 )	2025-01-26 23:02:57 +08:00
Yineng Zhang	02431b9ad2	fix link in README (#3153 )	2025-01-26 21:30:00 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
yinfan98	9286740eff	feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130 ) Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com> Co-authored-by: yinfan98 <1106110035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-26 02:55:08 +08:00
Yineng Zhang	04f0b4cbef	minor: update sgl-kernel setup (#3107 )	2025-01-24 20:10:35 +08:00
Trevor Morris	685a5738a7	Allow local cutlass directory to be used in sgl-kernel build (#3037 )	2025-01-24 03:59:47 -08:00
Ke Bao	6619f48e18	Fix cu118 group gemm compile issue (#3097 )	2025-01-24 15:19:09 +08:00
Yineng Zhang	5de4051bcf	feat: integrate sampling kernels into sgl-kernel (#3086 ) Co-authored-by: Zihao Ye <expye@outlook.com>	2025-01-24 01:54:47 +08:00
Yineng Zhang	07a22cbba3	use env variable to control the build conf on the CPU build node (#3080 )	2025-01-23 20:46:49 +08:00
Yineng Zhang	3d0bfa3e17	update version setup for sgl-kernel (#3079 )	2025-01-23 19:45:25 +08:00
Lianmin Zheng	553f5a3ffe	Remove torch dependency in sgl-kernel (#3074 )	2025-01-23 17:23:37 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	bf669606eb	feat: integrate bmm_fp8 kernel into sgl-kernel (#3056 )	2025-01-23 00:39:38 +08:00
Yineng Zhang	bcda0c9ee6	sync the upstream updates of flashinfer (#3051 )	2025-01-22 20:33:13 +08:00
Yineng Zhang	9f8f2c7f74	update norm cu (#3048 )	2025-01-22 18:58:44 +08:00
Ke Bao	6fc37bd8ee	Fix sgl-kernel compile for sm80 (#3046 )	2025-01-22 16:49:08 +08:00
Yineng Zhang	5a0d680a14	feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033 )	2025-01-21 20:44:49 +08:00
Ke Bao	5dfcacfcb1	Add compile flags for cutlass 3.x (#3013 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-01-21 00:04:12 +08:00
Byron Hsu	b5caa22dfb	[kernel] port rope cuda kernel to sgl-kernel (#2993 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-20 20:58:51 +08:00
lukec	6f98c586bd	fix sgl-kernel setup.py (#2963 )	2025-01-18 18:50:37 +08:00
Yineng Zhang	2dc957d421	fix setup for sgl kernel (#2917 )	2025-01-16 18:17:34 +08:00
Yineng Zhang	a53454c55e	fix: sgl-kernel link cuda (#2906 )	2025-01-16 04:53:23 +08:00
yizhang2077	6cb3974e77	optimize custom allreduce kernel (#2904 )	2025-01-16 03:04:25 +08:00
Xiaoyu Zhang	e2b16c4716	add sampling_scaling_penalties kernel (#2846 )	2025-01-12 19:38:17 -08:00
Ke Bao	0f3eb1d294	Support cutlass Int8 gemm (#2752 )	2025-01-06 22:51:22 +08:00
Yineng Zhang	b6b57fc200	minor: cleanup sgl-kernel (#2679 )	2024-12-31 14:52:00 +08:00
Ke Bao	b4403985d0	Add cutlass submodule for sgl-kernel (#2676 )	2024-12-31 14:28:29 +08:00
Ke Bao	b02da24a5b	Refactor sgl-kernel build (#2642 )	2024-12-30 18:07:01 +08:00
Yineng Zhang	31548116a8	fix moe_align_block_size_kernel for shared memory issue (#2579 ) Co-authored-by: ispobock <ispobaoke@163.com>	2024-12-26 05:31:04 +08:00
yizhang2077	e04d3f2897	adapt tensorrt llm custom all reduce to sgl-kernel (#2481 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-15 13:15:59 +08:00
Yineng Zhang	2673fa29d4	fix: set runtime path (#2466 )	2024-12-12 18:05:48 +08:00

1 2

57 Commits