sglang

Author	SHA1	Message	Date
ChangyiYang	485a023bd8	refactor apply_w8a8_block_fp8_linear in fp (#6545 )	2025-05-29 00:15:11 -07:00
HandH1998	4d643f6c7a	[1/2] Support Qserve (#6457 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-05-21 19:48:59 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Zhaoyi Li	3c9740d200	update variable naming and comments for rocm (#5299 )	2025-04-11 23:15:05 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Xiaoyu Zhang	f730362ee2	reduce moe_align_block_size_kernel small batch mode overhead (#5086 )	2025-04-09 17:59:35 -07:00
Yi Zhang	ebf495f013	sgl-kernel use cutlass latest version for fp8 blockwise gemm (#5207 )	2025-04-09 11:47:04 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Qingquan Song	61e4433caf	Add moe topk softmax templated from vllm (#4302 )	2025-03-14 12:03:33 -07:00
Yineng Zhang	2937387a50	fix accuracy issue (#4376 )	2025-03-13 02:06:22 -07:00
Qingquan Song	4068e01292	Fix per token fp8 quant precision (#4362 )	2025-03-12 21:19:05 -07:00
Shi Shuai	817d43705c	feat: support ep size < 32 for sgl kernel (#4348 )	2025-03-12 20:50:46 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00
Xiaoyu Zhang	7130a7cea9	refine sgl_moe_align_block_size_benchmark (#4327 )	2025-03-11 22:48:38 -07:00
Stefan He	95085d65e9	[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163 )	2025-03-06 22:58:52 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Xiaoyu Zhang	55a7ec388f	use warp shuffle style reduce and flashinfer vectorize (#3628 )	2025-02-19 20:53:51 +08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Xiaoyu Zhang	81262c7b72	clean up useless file (#3192 )	2025-01-28 14:29:30 +08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Ke Bao	7bad7e75bf	Add shapes for int8 gemm benchmark (#3093 )	2025-01-24 12:27:30 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	b7f3fec13c	minor: rename bench for sgl kernel (#2909 )	2025-01-16 05:55:43 +08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Ke Bao	0f3eb1d294	Support cutlass Int8 gemm (#2752 )	2025-01-06 22:51:22 +08:00

33 Commits