sglang

Author	SHA1	Message	Date
Xiaoyu Zhang	55a7ec388f	use warp shuffle style reduce and flashinfer vectorize (#3628 )	2025-02-19 20:53:51 +08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
Xiaoyu Zhang	3efbdf68b9	fix sgl-kernel codestyle (#3563 )	2025-02-14 18:05:52 +08:00
Yineng Zhang	e082142519	chore: bump 0.0.3.post6 sgl-kernel (#3555 )	2025-02-14 08:55:15 +08:00
Xiaoyu Zhang	f076328bb7	fix moe_align_kernel shm init not sync bug (#3534 )	2025-02-13 16:47:00 +08:00
Yineng Zhang	4430c0a513	chore: bump 0.0.3.post5 sgl-kernel (#3530 )	2025-02-13 01:51:46 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Yineng Zhang	b96e92e6e6	chore: bump 0.0.3.post4 sgl-kernel (#3523 )	2025-02-12 17:28:36 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Yineng Zhang	6239d0b2e7	chore: bump sgl-kernel v0.0.3.post3 (#3440 )	2025-02-10 04:00:52 +08:00
Yineng Zhang	4cfd3add6d	support version in sgl-kernel (#3439 )	2025-02-10 03:49:52 +08:00
Yineng Zhang	29daf498cd	fix cu118 link issue (#3421 )	2025-02-09 18:16:44 +08:00
Yineng Zhang	f9905d59a8	support speculative decoding kernel in sgl-kernel (#3373 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-07 20:29:51 +08:00
Yineng Zhang	45c87e083f	fix undefined symbol cudaGetDriverEntryPointByVersion (#3372 )	2025-02-07 19:32:45 +08:00
Xiaoyu Zhang	cdae77b03d	optimize moe_align_kernel cuda (#3347 )	2025-02-07 00:53:46 +08:00
Yineng Zhang	adeee15204	fix sgl-kernel build failure on AMD (#3352 )	2025-02-07 00:35:59 +08:00
Xiaoyu Zhang	ad3499858e	clean moe align block kernel code and add acc test (#3332 )	2025-02-06 16:42:36 +08:00
HAI	2c1a695ff1	ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287 )	2025-02-04 21:44:44 +08:00
Yineng Zhang	00fa7d0417	add copyright for sgl-kernel (#3270 )	2025-02-03 21:34:44 +08:00
Yineng Zhang	7876279ea7	update cutlass dependency (#3240 )	2025-02-01 03:13:44 +08:00
Yineng Zhang	3ee62235c6	revert the MoE dependence (#3230 )	2025-01-31 16:51:41 +08:00
Yineng Zhang	9602c2aac7	keep the parts needed for moe_kernels (#3218 )	2025-01-31 00:39:47 +08:00
Yineng Zhang	e81d7f11de	add tensorrt_llm moe_gemm as 3rdparty (#3217 )	2025-01-30 23:49:14 +08:00
Yineng Zhang	222ce6f1da	add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216 ) Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>	2025-01-30 23:04:41 +08:00
Yineng Zhang	468d23cff9	update setup for sgl-kernel (#3214 )	2025-01-30 19:47:50 +08:00
Yineng Zhang	c38b5fb4f4	update 3rdparty and rms norm for sgl-kernel (#3213 )	2025-01-30 19:32:21 +08:00
Xiaoyu Zhang	81262c7b72	clean up useless file (#3192 )	2025-01-28 14:29:30 +08:00
Yineng Zhang	8a96f74988	chore: bump 0.0.3 for sgl-kernel (#3178 ) Co-authored-by: ispobock <ispobaoke@hotmail.com> Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com> Co-authored-by: HandH1998 <007aabbcc411@gmail.com> Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ByronHsu <byronhsu1230@gmail.com>	2025-01-27 20:29:28 +08:00
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Byron Hsu	514f37c32b	[kernel] Fix position ids in rope (#3173 )	2025-01-27 17:09:51 +08:00
Byron Hsu	741fccd7bf	Bump sgl kernel to 0.0.2.post19 (#3167 )	2025-01-27 15:36:07 +08:00
Byron Hsu	fb11a43981	[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134 )	2025-01-27 15:28:00 +08:00
Yineng Zhang	f265d15b96	use self-hosted to build sgl-kernel (#3154 )	2025-01-26 23:02:57 +08:00
Yineng Zhang	02431b9ad2	fix link in README (#3153 )	2025-01-26 21:30:00 +08:00
Yineng Zhang	318260c0fa	chore: bump 0.0.2.post18 for sgl-kernel (#3149 )	2025-01-26 19:00:34 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
yinfan98	9286740eff	feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130 ) Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com> Co-authored-by: yinfan98 <1106110035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-26 02:55:08 +08:00
Yineng Zhang	896c07441e	update installation doc for sgl-kernel (#3129 )	2025-01-26 00:00:13 +08:00
Yineng Zhang	14e754a868	chore: bump v0.0.2.post17 for sgl-kernel (#3125 )	2025-01-25 20:43:02 +08:00
yizhang2077	98522149ff	mirror fix for custom allreduce (#3124 )	2025-01-25 18:26:41 +08:00
Xiaoyu Zhang	5d9d15e70f	support fp32 in sampling_scaling_penalties kernel (#3121 )	2025-01-25 16:52:17 +08:00
Ke Bao	a22f60a313	Add workflow for sgl-kernel cu118 release (#3109 )	2025-01-24 22:30:30 +08:00
Yineng Zhang	04f0b4cbef	minor: update sgl-kernel setup (#3107 )	2025-01-24 20:10:35 +08:00
Trevor Morris	685a5738a7	Allow local cutlass directory to be used in sgl-kernel build (#3037 )	2025-01-24 03:59:47 -08:00
Yineng Zhang	153b414e83	minor: sync flashinfer and add turbomind as 3rdparty (#3105 )	2025-01-24 19:22:39 +08:00
Ke Bao	6619f48e18	Fix cu118 group gemm compile issue (#3097 )	2025-01-24 15:19:09 +08:00
Ke Bao	7bad7e75bf	Add shapes for int8 gemm benchmark (#3093 )	2025-01-24 12:27:30 +08:00
Yineng Zhang	54bac8af0b	chore: bump sgl-kernel 0.0.2.post16 (#3087 )	2025-01-24 01:57:48 +08:00

1 2 3

116 Commits