sglang

Author	SHA1	Message	Date
Lianmin Zheng	9b8ebb2798	move more files under srt/utils (#11285 )	2025-10-09 16:46:15 -07:00
Mick	a3c2ea4451	fix: fix revision for sgl-flash-attn in sgl-kernel (#11327 )	2025-10-08 15:50:44 -07:00
Yuan Luo	4f42c8cd3e	[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-07 14:31:11 +00:00
sglang-bot	8c9670375f	chore: bump sgl-kernel version to 0.3.15 (#11281 )	2025-10-06 18:17:51 -07:00
Yineng Zhang	fb27d38305	docs: update sgl-kernel README (#11286 )	2025-10-06 17:55:22 -07:00
Lifu Huang	748f86f3de	[Bug] Fix incorrect assertion in FA4 and add UT. (#11182 )	2025-10-06 14:58:39 -07:00
Lianmin Zheng	d645ae90a3	Rename runner labels (#11228 )	2025-10-05 18:05:41 -07:00
Lianmin Zheng	148d8d485d	Update DeepGEMM repository tag to specific commit (#11229 )	2025-10-05 13:47:36 -07:00
PGFLMG	1a599509cc	chore: bump sgl-kernel v0.3.14.post1 (#11137 )	2025-10-05 13:46:43 -07:00
DarkSharpness	e0b2d3eebe	[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-05 10:19:03 -07:00
PGFLMG	580051c5a8	chore: bump sgl-kernel v0.3.14 (#11067 )	2025-09-30 02:53:24 -07:00
Xiaoyu Zhang	11965b0daf	Fix sgl-kernel benchmark dead code (#11022 )	2025-09-29 15:06:40 +08:00
Zhihao Zhang	24f7cb1ece	[speculative decoding] rename lookahead to ngram (#11010 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com>	2025-09-28 21:06:59 -07:00
Lifu Huang	e98d9346c7	[1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940 )	2025-09-28 19:59:14 -07:00
Kangyan-Zhou	0c9174108a	Unify SGL Kernel Releases (#10701 )	2025-09-28 19:48:28 -07:00
Lianmin Zheng	07440f5f34	Fix FusedSetKVBufferArg in RotaryEmbedding (#11003 )	2025-09-28 11:17:27 -07:00
Yuan Luo	42245551ef	[sgl-kernel] Optimize concat_mla_k kernel (#10543 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com>	2025-09-28 23:04:22 +08:00
Lianmin Zheng	35ec2a45a8	[minor] Remove deprecated function `get_ip` (#10883 )	2025-09-25 16:18:04 -07:00
Yuhao Yao	fe531d6f4e	[Bug] Fix Issue#10215 (#10572 )	2025-09-25 09:51:50 +08:00
Xiaoyu Zhang	c4e314f986	Restruct sgl-kernel benchmark (#10861 )	2025-09-25 07:45:25 +08:00
Yineng Zhang	e53df7c009	chore: bump sgl-kernel v0.3.12 (#10732 )	2025-09-22 14:39:25 -07:00
Qi Yuhang	0f04a5f428	Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714 )	2025-09-21 17:04:27 -07:00
Yuan Luo	616a3e20df	[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-09-19 14:12:09 +08:00
Yineng Zhang	5bfafdfcb4	chore: bump sgl-kernel 0.3.11 (#10630 )	2025-09-18 18:43:20 -07:00
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
Zaili Wang	6fd4816d9f	Fix sgl_kernel import failure on devices other than CUDA (#10610 )	2025-09-18 11:38:02 -07:00
EduardDurech	a77564e0fb	CUDA Arch Independent (#8813 )	2025-09-16 23:01:45 -07:00
cicirori	a2f7218a2e	support using fa4 on deepseek on blackwell (#9928 )	2025-09-16 16:16:06 -07:00
Qi Yuhang	9b876889b7	Update CUTLASS. Refine KernelSchedule for fp8 (grouped) gemm. (#10491 )	2025-09-16 02:47:37 -07:00
Yineng Zhang	5207424014	chore: bump v0.3.10 sgl-kernel (#10478 )	2025-09-15 15:20:09 -07:00
fzyzcjy	3b25dc127a	[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473 )	2025-09-15 11:53:21 -07:00
Lianmin Zheng	50dc0c1e9c	Run tests based on labels (#10456 )	2025-09-15 00:29:20 -07:00
fzyzcjy	ca63f075b7	Revert "Fix FA4 import cause moe_fused_gate output be illegal memory" (#10432 )	2025-09-14 19:03:27 -07:00
fzyzcjy	258d02c86d	Fix correction bias undefined behavior for nvfp4 models (#10426 )	2025-09-14 18:41:09 -07:00
Yineng Zhang	55025b9282	fix: use latest flashinfer (#10428 )	2025-09-14 15:07:14 -07:00
Lianmin Zheng	c9ec4cae5b	Fix the style of sgl kernel (#10398 )	2025-09-12 22:20:21 -07:00
fzyzcjy	3a77c80b26	Fix FA4 import cause moe_fused_gate output be illegal memory (#10368 )	2025-09-12 03:21:26 -07:00
Hubert Lu	fe68c1486f	Fix errors of hicache kernels in sgl-kernel for ROCm (#10339 )	2025-09-11 14:54:34 -07:00
Yineng Zhang	532f998b0f	chore: bump sgl-kernel 0.3.9.post2 (#10311 )	2025-09-11 01:29:50 -07:00
Yineng Zhang	de15d1405a	Revert "Fix flashinfer version in sgl-kernel (#10135 )" (#10310 )	2025-09-11 01:27:58 -07:00
Yineng Zhang	5b7448de77	chore: bump sgl-kernel 0.3.9.post1 (#10294 )	2025-09-10 18:26:34 -07:00
Yineng Zhang	6d55f60e77	Revert "[1/2] Optimizations and refactors about quant kernel (#9534 )" (#10292 )	2025-09-10 18:24:23 -07:00
Rain Jiang	2286e85e77	pass a_scale from fp8 quant result instead of hard code to 1.0f (#10241 ) Co-authored-by: Yichen Wang <yichen.wang@bytedance.com> Co-authored-by: Jinwu Guo <641876696@qq.com>	2025-09-10 12:56:05 -07:00
huangtingwei	5be8c2f7f7	Page first direct IO kernel (#10060 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-09-10 13:35:34 +08:00
Yi Zhang	8cbe1538ef	Add mamba kernel (#10234 )	2025-09-09 12:58:43 -07:00
Yineng Zhang	f3817cb0b2	chore: bump v0.3.9 sgl-kernel (#10208 )	2025-09-09 01:40:05 -07:00
Yineng Zhang	94fb4e9e54	feat: support fa cute in sgl-kernel (#10205 ) Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>	2025-09-09 00:14:39 -07:00
blzheng	d1d4074c4e	[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300 )	2025-09-08 23:23:13 -07:00
Keyang Ru	718f25ae6e	Explicitly export CMAKE_BUILD_PARALLEL_LEVEL (#10193 )	2025-09-08 22:35:27 -07:00
Yineng Zhang	cdc56ef6c1	feat: use sgl-kernel cu129 as default (#10188 )	2025-09-08 22:01:17 -07:00

1 2 3 4 5 ...

559 Commits