sglang

Author	SHA1	Message	Date
Yineng Zhang	6dea5c96bf	Revert "get the python version from env (#4729 )" (#4863 )	2025-03-28 08:07:48 -07:00
DavidChan	5eae67cb1f	get the python version from env (#4729 )	2025-03-27 22:26:42 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
Yineng Zhang	9971dc2283	Revert "feat: Add FlashMLA submodule (#4449 )" (#4470 )	2025-03-16 01:30:05 -07:00
Ying Sheng	52a34d7448	Add greedy verification kernel (#4383 )	2025-03-16 00:58:26 -07:00
Shi Shuai	81f431eded	feat: Add FlashMLA submodule (#4449 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-03-15 23:30:25 -07:00
Qingquan Song	61e4433caf	Add moe topk softmax templated from vllm (#4302 )	2025-03-14 12:03:33 -07:00
Yineng Zhang	2937387a50	fix accuracy issue (#4376 )	2025-03-13 02:06:22 -07:00
Elfie Guo	7c86671131	Support Blackwell Block Scale FP8 Gemm (#4278 )	2025-03-12 14:17:11 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00
laixin	c553e1604c	DeepGemm integrate to sgl-kernel (#4165 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-03-10 00:35:07 -07:00
Lianmin Zheng	7c0541b385	Move activation.cu to sgl-kernel/elementwise (#4250 )	2025-03-09 22:41:13 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Yineng Zhang	df84ab2a5b	update sgl-kernel 3rdparty (#4228 )	2025-03-09 01:16:05 -08:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Liu Jinjie	926f8efc0c	remove unused max_jobs (#3607 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-04 04:23:39 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	6b45a21d16	Reorganize c++ source files in sgl-kernel with multiple folders (#4025 )	2025-03-03 05:32:30 -08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Yineng Zhang	29daf498cd	fix cu118 link issue (#3421 )	2025-02-09 18:16:44 +08:00
Yineng Zhang	f9905d59a8	support speculative decoding kernel in sgl-kernel (#3373 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-07 20:29:51 +08:00
Yineng Zhang	00fa7d0417	add copyright for sgl-kernel (#3270 )	2025-02-03 21:34:44 +08:00
Yineng Zhang	3ee62235c6	revert the MoE dependence (#3230 )	2025-01-31 16:51:41 +08:00
Yineng Zhang	222ce6f1da	add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216 ) Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>	2025-01-30 23:04:41 +08:00
Yineng Zhang	468d23cff9	update setup for sgl-kernel (#3214 )	2025-01-30 19:47:50 +08:00
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Byron Hsu	fb11a43981	[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134 )	2025-01-27 15:28:00 +08:00
Yineng Zhang	f265d15b96	use self-hosted to build sgl-kernel (#3154 )	2025-01-26 23:02:57 +08:00
Yineng Zhang	02431b9ad2	fix link in README (#3153 )	2025-01-26 21:30:00 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
yinfan98	9286740eff	feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130 ) Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com> Co-authored-by: yinfan98 <1106110035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-26 02:55:08 +08:00
Yineng Zhang	04f0b4cbef	minor: update sgl-kernel setup (#3107 )	2025-01-24 20:10:35 +08:00
Trevor Morris	685a5738a7	Allow local cutlass directory to be used in sgl-kernel build (#3037 )	2025-01-24 03:59:47 -08:00
Ke Bao	6619f48e18	Fix cu118 group gemm compile issue (#3097 )	2025-01-24 15:19:09 +08:00
Yineng Zhang	5de4051bcf	feat: integrate sampling kernels into sgl-kernel (#3086 ) Co-authored-by: Zihao Ye <expye@outlook.com>	2025-01-24 01:54:47 +08:00
Yineng Zhang	07a22cbba3	use env variable to control the build conf on the CPU build node (#3080 )	2025-01-23 20:46:49 +08:00
Yineng Zhang	3d0bfa3e17	update version setup for sgl-kernel (#3079 )	2025-01-23 19:45:25 +08:00
Lianmin Zheng	553f5a3ffe	Remove torch dependency in sgl-kernel (#3074 )	2025-01-23 17:23:37 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	bf669606eb	feat: integrate bmm_fp8 kernel into sgl-kernel (#3056 )	2025-01-23 00:39:38 +08:00
Yineng Zhang	bcda0c9ee6	sync the upstream updates of flashinfer (#3051 )	2025-01-22 20:33:13 +08:00

1 2

74 Commits