sglang

Author	SHA1	Message	Date
Stefan He	95085d65e9	[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163 )	2025-03-06 22:58:52 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Xiaoyu Zhang	55a7ec388f	use warp shuffle style reduce and flashinfer vectorize (#3628 )	2025-02-19 20:53:51 +08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Xiaoyu Zhang	81262c7b72	clean up useless file (#3192 )	2025-01-28 14:29:30 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Ke Bao	7bad7e75bf	Add shapes for int8 gemm benchmark (#3093 )	2025-01-24 12:27:30 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	b7f3fec13c	minor: rename bench for sgl kernel (#2909 )	2025-01-16 05:55:43 +08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Ke Bao	0f3eb1d294	Support cutlass Int8 gemm (#2752 )	2025-01-06 22:51:22 +08:00