sglang

Author	SHA1	Message	Date
yinfan98	d2e507df3c	[Misc] clean up vllm in sgl-kernel test (#5189 )	2025-04-09 01:22:13 -07:00
yinfan98	9798e72baa	[Misc] Use pytest.mark.skipif in sgl-kernel test (#5137 )	2025-04-07 21:35:14 -07:00
Yi Zhang	bcbbf519f9	sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079 )	2025-04-05 14:23:20 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
yinfan98	b8b6008f47	[Fix] fix fa3 build at cu118 (#5036 )	2025-04-03 11:52:35 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
yinfan98	37c66ec856	[feat] add fa3 in sgl-kernel (#4902 ) Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>	2025-03-30 12:57:10 -07:00
Adarsh Shirawalmath	9fccda3111	[Feature] use pytest for sgl-kernel (#4896 )	2025-03-30 10:36:52 -07:00
Yi Zhang	5ec5eaf760	fix allreduce test (#4909 )	2025-03-29 23:16:53 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
yinfan98	ddf8981d91	Delete test_deep_gemm.py (#4891 )	2025-03-29 10:46:11 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
AniZpZ	321ab756bc	[1/3] fix dsv3 awq issue (#4556 ) Co-authored-by: leoneo <1320612015@qq.com>	2025-03-22 01:07:17 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
Yineng Zhang	9971dc2283	Revert "feat: Add FlashMLA submodule (#4449 )" (#4470 )	2025-03-16 01:30:05 -07:00
Ying Sheng	52a34d7448	Add greedy verification kernel (#4383 )	2025-03-16 00:58:26 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Shi Shuai	81f431eded	feat: Add FlashMLA submodule (#4449 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-03-15 23:30:25 -07:00
Qingquan Song	61e4433caf	Add moe topk softmax templated from vllm (#4302 )	2025-03-14 12:03:33 -07:00
Yineng Zhang	2937387a50	fix accuracy issue (#4376 )	2025-03-13 02:06:22 -07:00
Qingquan Song	4068e01292	Fix per token fp8 quant precision (#4362 )	2025-03-12 21:19:05 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00
Xiaoyu Zhang	23308a9032	fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231 )	2025-03-10 01:42:58 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
laixin	c553e1604c	DeepGemm integrate to sgl-kernel (#4165 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-03-10 00:35:07 -07:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00
lukec	b93ef5e56d	Remove the vllm dependency from the moe_align function (#4164 ) Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>	2025-03-07 22:42:16 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Lianmin Zheng	110e006673	Reorganize python source files in sgl-kernel with multiple files (#4027 )	2025-03-03 06:36:40 -08:00
Lianmin Zheng	6b45a21d16	Reorganize c++ source files in sgl-kernel with multiple folders (#4025 )	2025-03-03 05:32:30 -08:00
Chayenne	18bb216c28	Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982 )	2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team	1c96fa86cf	[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613 )	2025-02-27 19:42:48 -08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Yineng Zhang	f9905d59a8	support speculative decoding kernel in sgl-kernel (#3373 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-07 20:29:51 +08:00
Xiaoyu Zhang	ad3499858e	clean moe align block kernel code and add acc test (#3332 )	2025-02-06 16:42:36 +08:00
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Byron Hsu	514f37c32b	[kernel] Fix position ids in rope (#3173 )	2025-01-27 17:09:51 +08:00
Byron Hsu	fb11a43981	[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134 )	2025-01-27 15:28:00 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
Xiaoyu Zhang	5d9d15e70f	support fp32 in sampling_scaling_penalties kernel (#3121 )	2025-01-25 16:52:17 +08:00
Yineng Zhang	5de4051bcf	feat: integrate sampling kernels into sgl-kernel (#3086 ) Co-authored-by: Zihao Ye <expye@outlook.com>	2025-01-24 01:54:47 +08:00
Xiaoyu Zhang	e0cd65c2b6	[hotfix] fix test_sampling_scaling_penalties.py ci test (#3084 )	2025-01-24 00:33:59 +08:00
Xiaoyu Zhang	f1b6861828	use flashinfer vec_dtypes in sgl_kernel (#3083 )	2025-01-23 22:19:04 +08:00
Yineng Zhang	0da0989ad4	sync flashinfer and update sgl-kernel tests (#3081 )	2025-01-23 21:13:55 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00

1 2

69 Commits