sglang

Author	SHA1	Message	Date
hlu1	1e85589dc5	Make fp4_quantize kernels work on sm103 (#9807 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-29 21:15:08 -07:00
Kaixi Hou	5c34b4f1c7	[NVIDIA] [2/N] Optimize `silu_and_mul_scaled_fp4_grouped_quant` perf (#9556 )	2025-08-29 17:17:03 -07:00
hlu1	7a16db9bd9	Make sm100 fp8 kernels available on sm103 (#9789 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-28 23:47:29 -07:00
Kaixi Hou	e5638573c1	[NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant op for the flashinfer grouped gemm (#9200 )	2025-08-22 12:19:45 -07:00
Hubert Lu	c6c379ab31	[AMD] Reorganize hip-related header files in sgl-kernel (#9320 )	2025-08-18 16:53:44 -07:00
jy-song-hub	4fc09e0df0	Fp4 MOE quant kernel optimization (#8777 ) Co-authored-by: Rain Jiang <96632942+rainj-me@users.noreply.github.com>	2025-08-15 01:46:16 -07:00
Yuan Luo	432f2053dd	[sgl-kernel] 1/N Refactor sglang cutlass 3x - gemm fp8 blockwise sm90 (#8913 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-08-14 10:55:54 -07:00
Peng Zhang	5aa1ebd242	[2/n]decouple quantization implementation from vLLM dependency (#8112 ) Co-authored-by: walker-ai <yiyun.wyt@antgroup.com> Co-authored-by: leoneo <1320612015@qq.com>	2025-08-14 03:19:03 -07:00
henryg	841810f227	[Perf] Tunings for SM100 FP8 CUTLASS kernel (#8818 )	2025-08-13 21:59:22 -07:00
triple-mu	444013585d	Fix typos and unify size(s)/stride(s) API calls (#8799 )	2025-08-08 00:18:08 -07:00
Xiaoyu Zhang	f57d2dc162	[sgl-kernel] avoid per_token_quant_fp8.cu hardcode sm_count (#8738 )	2025-08-04 12:55:57 +08:00
Stefan He	db7343c992	fix per token cuda kernel hidden dim cannot divide by 16 (#8543 )	2025-08-01 09:27:18 -07:00
Peter Pan	6bdd27861b	[Kimi K2] dsv3_router_gemm supports NUM_EXPERTS == 384 (#8013 )	2025-08-01 22:01:24 +08:00
Xiaoyu Zhang	7a4309cc8a	[sgl-kernel performace] fix fp8 quant kernels dispatch __nv_fp8_e4m3 bug to improve performance 10%-20% (#8499 ) Co-authored-by: Ke Bao <ispobaoke@gmail.com>	2025-07-29 23:31:54 +08:00
strgrb	fb4ce17de6	Fix per_token_group_quant_8bit when hidden_dim // group_size is not divided by 4. (#8449 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-07-28 01:32:46 -07:00
Yuan Luo	0c8dab9e67	[sgl-kernel] Opt per_token_quant_fp8 with warp reduce (#8130 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-07-23 21:22:59 +08:00
Baizhou Zhang	282eb59ff3	Add bf16 output option for dsv3_router_gemm kernel (#7999 )	2025-07-20 09:49:37 +08:00
Qi Yuhang	6e92da8fca	[Fix][Ready]Fix register spilling in cutlass nvfp4 gemm kernel on Blackwell (#8127 )	2025-07-17 20:49:36 -07:00
likesen-alibaba	4a0d19198b	Fix bug of deepseek-v3 under DP+EP mode with large batchsize/seqlen (#6449 )	2025-07-10 01:19:56 -07:00
Baizhou Zhang	7248272ccc	Add dsv3 router gemm kernel (#7627 )	2025-06-29 23:31:55 -07:00
Ke Bao	04b35190e2	Add dsv3 fused a gemm to sgl-kernel (#7630 )	2025-06-29 02:52:24 -07:00
AniZpZ	3eb4a800e8	Fix AWQ Dequant and Weight Loading of deepseek v2 (#6842 )	2025-06-17 13:45:10 -07:00
fzyzcjy	5c66c4424f	Support new DeepGEMM format in per token group quant (#7146 )	2025-06-13 02:00:22 -07:00
Pavani Majety	eb38c7d1ca	[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-06-02 13:48:03 -07:00
HandH1998	4d643f6c7a	[1/2] Support Qserve (#6457 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-05-21 19:48:59 -07:00
Elfie Guo	c23a7072b6	Upgrade CUTLASS 4.0 (#6336 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-05-15 17:42:23 -07:00
Yineng Zhang	6f56614734	chore: upgrade cutlass 3.9.2 (#6004 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-06 13:34:08 -07:00
Xiaoyu Zhang	5bb0accbcf	cutlass 3.9 supported to improve fp8_blockwise_gemm (#5820 )	2025-04-28 21:52:36 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Yi Zhang	ebf495f013	sgl-kernel use cutlass latest version for fp8 blockwise gemm (#5207 )	2025-04-09 11:47:04 -07:00
Xiaoyu Zhang	2c8fd99363	[sgl-kernel] per token group quant support COLUMN MAJOR (#4817 )	2025-04-02 18:29:59 -07:00
Yuhong Guo	ee47a6c1c3	[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu (#4953 )	2025-03-31 12:00:34 -07:00
Yineng Zhang	ec3ee0289d	fix sgl-kernel cu118 build (#4872 )	2025-03-28 17:23:51 -07:00
Yineng Zhang	8bf6d7f406	support cmake for sgl-kernel (#4706 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-27 01:42:28 -07:00
Yi Pan	45fdf1f7f3	Fix shared memory OOM on sm86 GPUs. (#4797 )	2025-03-26 10:41:53 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
AniZpZ	321ab756bc	[1/3] fix dsv3 awq issue (#4556 ) Co-authored-by: leoneo <1320612015@qq.com>	2025-03-22 01:07:17 -07:00
Chunan Zeng	6a384d5c01	Speed up per token and per tensor quant by 15% (#4639 )	2025-03-22 00:37:57 -07:00
Shu Wang	ad4e58bf67	Support fp8 gemm for blackwell (#4558 )	2025-03-20 12:40:28 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
Yineng Zhang	2937387a50	fix accuracy issue (#4376 )	2025-03-13 02:06:22 -07:00
Qingquan Song	4068e01292	Fix per token fp8 quant precision (#4362 )	2025-03-12 21:19:05 -07:00
Elfie Guo	7c86671131	Support Blackwell Block Scale FP8 Gemm (#4278 )	2025-03-12 14:17:11 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00
Stefan He	e0917e6bd0	Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215 ) Co-authored-by: Stefan He <bhe@linkedin.com>	2025-03-12 00:08:03 -07:00
Xiaoyu Zhang	23308a9032	fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231 )	2025-03-10 01:42:58 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00

50 Commits