sglang

Author	SHA1	Message	Date
Ke Bao	a3398d8478	Optimize moe align block size kernel (#7794 )	2025-07-07 09:20:30 +08:00
Lianmin Zheng	5589b75024	Add treemask mode to build_eagle_tree & release sgl-kernel 0.2.3 (#7756 ) Co-authored-by: Pranjal Shankhdhar <pranjal.ssh@gmail.com>	2025-07-05 12:17:05 -07:00
SijiaYang	da3890e82a	[1/n]: add cutlass W4A8 moe kernel for hopper architecture (#7772 ) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Co-authored-by: yicwang <yichen.wang@bytedance.com>	2025-07-04 20:50:12 -07:00
Yi Zhang	2998c4bdf4	[optimize] fuse renormalize into moe_topk_softmax (#7744 ) Co-authored-by: ispobock <ispobaoke@gmail.com>	2025-07-03 12:42:44 -07:00
AniZpZ	8e03b641ba	[1/n] apply wna16marlin kernel in moe weight only quantization (#7683 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: 弋云 <yiyun.wyt@antgroup.com> Co-authored-by: walker-ai <2398833647@qq.com>	2025-07-01 23:21:25 -07:00
Baizhou Zhang	7248272ccc	Add dsv3 router gemm kernel (#7627 )	2025-06-29 23:31:55 -07:00
Ke Bao	04b35190e2	Add dsv3 fused a gemm to sgl-kernel (#7630 )	2025-06-29 02:52:24 -07:00
Ke Bao	57ab776910	Fuse sorted_token_ids padding to moe_align_block_size kernel (#7437 )	2025-06-24 17:44:27 -07:00
Zhiqiang Xie	34c3f9b2d3	kvcache io kernels and test case (#7382 )	2025-06-23 11:58:59 -07:00
Lianmin Zheng	cfceb83d05	Fix sampling for speculative decoding & simplify kernels (#7207 )	2025-06-16 03:28:30 -07:00
JieXin Liang	ab1a4fa5cb	[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla (#7184 )	2025-06-14 12:45:41 -07:00
fzyzcjy	5c66c4424f	Support new DeepGEMM format in per token group quant (#7146 )	2025-06-13 02:00:22 -07:00
fzyzcjy	aa46ed34d2	Remove 200us slow concat kernel (part 1: kernel) (#7145 )	2025-06-13 01:58:29 -07:00
Yuan Luo	84727a5139	[sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul (#6919 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-06-11 20:43:08 -07:00
JieXin Liang	18efb5e8e0	[perf][sgl-kernel] extend cutlass_mla_decode to support num_head < 128 (#6929 )	2025-06-08 19:37:34 -07:00
Elfie Guo	3e56f557fd	Add a CUDA kernel for fusing mapping and weighted sum for MoE. (#6916 ) Co-authored-by: Elfie Guo <elfiegxf@gmail.com>	2025-06-07 15:24:39 -07:00
Yuan Luo	43baba649e	[EP] Add cuda kernel for moe_ep_post_reorder (#6837 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-06-05 00:33:47 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
Pavani Majety	eb38c7d1ca	[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-06-02 13:48:03 -07:00
Yuan Luo	55444ed667	[EP] Add cuda kernel for moe_ep_pre_reorder (#6699 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-06-01 20:49:01 -07:00
HandH1998	4d643f6c7a	[1/2] Support Qserve (#6457 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-05-21 19:48:59 -07:00
Elfie Guo	6fc9357503	[2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694 )	2025-05-16 13:14:07 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
PGFLMG	ee71ed8a41	[Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (#5847 ) Co-authored-by: sighingnow <sighingnow@gmail.com>	2025-04-28 11:03:17 -07:00
Yineng Zhang	15fabcc07f	fix sgl-kernel unit tests (#5666 )	2025-04-23 01:18:30 -07:00
Elfie Guo	e62c49557d	[1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (#5281 )	2025-04-22 22:28:20 -07:00
Xiaoyu Zhang	8e09b37077	Sgl kernel fused_moe_gate support n_shared_experts (#5440 )	2025-04-17 23:05:15 -07:00
PGFLMG	c08a717c77	[Feat] Update sgl-kernel flashinfer to latest main version (#5500 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-17 12:43:23 -07:00
DefTruth	388e15c0db	kernel: support slightly faster merge_state_v2 cuda kernel (#5381 )	2025-04-14 21:28:23 -07:00
Yineng Zhang	b62e7e99b8	feat: adapt merge_state (#5337 )	2025-04-12 21:14:04 -07:00
PGFLMG	4879e50c6d	[Feat] Add sparse attn to sgl-kernel (#5327 )	2025-04-12 11:36:36 -07:00
Trevor Morris	f65b8d5c89	Blackwell Cutlass MLA kernel (#5142 )	2025-04-11 22:16:51 -07:00
Yineng Zhang	136b8e6afb	fix: remove cublas_grouped_gemm (#5307 )	2025-04-11 16:22:37 -07:00
Yi Zhang	bcbbf519f9	sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079 )	2025-04-05 14:23:20 -07:00
yinfan98	b8b6008f47	[Fix] fix fa3 build at cu118 (#5036 )	2025-04-03 11:52:35 -07:00
yinfan98	37c66ec856	[feat] add fa3 in sgl-kernel (#4902 ) Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>	2025-03-30 12:57:10 -07:00
Qingquan Song	45dcfc2e76	Add deepseek style fused moe group gate selection kernel (#4530 )	2025-03-29 11:51:45 -07:00
Yineng Zhang	8bf6d7f406	support cmake for sgl-kernel (#4706 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-27 01:42:28 -07:00
Trevor Morris	e9f8e42318	Support FP4 gemm (1/2) (#3899 )	2025-03-24 19:50:23 -07:00
Chunan Zeng	65c24c28f9	[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )	2025-03-23 23:44:17 -07:00
Yi Zhang	25e1816eff	fix custom allreduce performance/accuracy problem (#4477 )	2025-03-16 12:16:30 -07:00
Ying Sheng	52a34d7448	Add greedy verification kernel (#4383 )	2025-03-16 00:58:26 -07:00
Qingquan Song	61e4433caf	Add moe topk softmax templated from vllm (#4302 )	2025-03-14 12:03:33 -07:00
Elfie Guo	7c86671131	Support Blackwell Block Scale FP8 Gemm (#4278 )	2025-03-12 14:17:11 -07:00
Rex	07f944631e	Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )	2025-03-12 00:10:02 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00

49 Commits