sglang

Author	SHA1	Message	Date
Fan Yin	3289da5b41	[sgl-kernel] support hadamard (#11663 )	2025-10-15 19:00:44 -07:00
Fan Yin	5464457251	[sgl-kernel] Optimize gguf test (#11667 )	2025-10-15 15:45:53 -07:00
Qi Yuhang	6c01844f45	[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674 )	2025-10-15 13:39:31 -07:00
fzyzcjy	32803fb279	Super tiny improve FA3 import error message (#11590 )	2025-10-14 22:06:31 -07:00
sglang-bot	98923880bc	chore: bump sgl-kernel version to 0.3.16.post2 (#11583 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 20:52:38 -07:00
Yineng Zhang	f792e3c561	Revert "[NVIDIA] BUMP FA3 (#11444 )" (#11582 )	2025-10-13 20:51:45 -07:00
sglang-bot	60b0503227	chore: bump sgl-kernel version to 0.3.16.post1 (#11573 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 16:26:18 -07:00
Qi Yuhang	dc48c4c0e3	[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534 )	2025-10-13 16:24:48 -07:00
Johnny	b8c430f1ce	[NVIDIA] BUMP FA3 (#11444 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2025-10-13 09:30:57 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
sglang-bot	2db2cddd12	chore: bump sgl-kernel version to 0.3.16 (#11476 )	2025-10-11 22:04:49 -07:00
PGFLMG	8fdcd98efe	[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019 )	2025-10-11 14:04:57 -07:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
Lianmin Zheng	9b8ebb2798	move more files under srt/utils (#11285 )	2025-10-09 16:46:15 -07:00
Mick	a3c2ea4451	fix: fix revision for sgl-flash-attn in sgl-kernel (#11327 )	2025-10-08 15:50:44 -07:00
Yuan Luo	4f42c8cd3e	[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-07 14:31:11 +00:00
sglang-bot	8c9670375f	chore: bump sgl-kernel version to 0.3.15 (#11281 )	2025-10-06 18:17:51 -07:00
Yineng Zhang	fb27d38305	docs: update sgl-kernel README (#11286 )	2025-10-06 17:55:22 -07:00
Lifu Huang	748f86f3de	[Bug] Fix incorrect assertion in FA4 and add UT. (#11182 )	2025-10-06 14:58:39 -07:00
Lianmin Zheng	d645ae90a3	Rename runner labels (#11228 )	2025-10-05 18:05:41 -07:00
Lianmin Zheng	148d8d485d	Update DeepGEMM repository tag to specific commit (#11229 )	2025-10-05 13:47:36 -07:00
PGFLMG	1a599509cc	chore: bump sgl-kernel v0.3.14.post1 (#11137 )	2025-10-05 13:46:43 -07:00
DarkSharpness	e0b2d3eebe	[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-05 10:19:03 -07:00
PGFLMG	580051c5a8	chore: bump sgl-kernel v0.3.14 (#11067 )	2025-09-30 02:53:24 -07:00
Xiaoyu Zhang	11965b0daf	Fix sgl-kernel benchmark dead code (#11022 )	2025-09-29 15:06:40 +08:00
Zhihao Zhang	24f7cb1ece	[speculative decoding] rename lookahead to ngram (#11010 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com>	2025-09-28 21:06:59 -07:00
Lifu Huang	e98d9346c7	[1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940 )	2025-09-28 19:59:14 -07:00
Kangyan-Zhou	0c9174108a	Unify SGL Kernel Releases (#10701 )	2025-09-28 19:48:28 -07:00
Lianmin Zheng	07440f5f34	Fix FusedSetKVBufferArg in RotaryEmbedding (#11003 )	2025-09-28 11:17:27 -07:00
Yuan Luo	42245551ef	[sgl-kernel] Optimize concat_mla_k kernel (#10543 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com>	2025-09-28 23:04:22 +08:00
Lianmin Zheng	35ec2a45a8	[minor] Remove deprecated function `get_ip` (#10883 )	2025-09-25 16:18:04 -07:00
Yuhao Yao	fe531d6f4e	[Bug] Fix Issue#10215 (#10572 )	2025-09-25 09:51:50 +08:00
Xiaoyu Zhang	c4e314f986	Restruct sgl-kernel benchmark (#10861 )	2025-09-25 07:45:25 +08:00
Yineng Zhang	e53df7c009	chore: bump sgl-kernel v0.3.12 (#10732 )	2025-09-22 14:39:25 -07:00
Qi Yuhang	0f04a5f428	Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714 )	2025-09-21 17:04:27 -07:00
Yuan Luo	616a3e20df	[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-09-19 14:12:09 +08:00
Yineng Zhang	5bfafdfcb4	chore: bump sgl-kernel 0.3.11 (#10630 )	2025-09-18 18:43:20 -07:00
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
Zaili Wang	6fd4816d9f	Fix sgl_kernel import failure on devices other than CUDA (#10610 )	2025-09-18 11:38:02 -07:00
EduardDurech	a77564e0fb	CUDA Arch Independent (#8813 )	2025-09-16 23:01:45 -07:00
cicirori	a2f7218a2e	support using fa4 on deepseek on blackwell (#9928 )	2025-09-16 16:16:06 -07:00
Qi Yuhang	9b876889b7	Update CUTLASS. Refine KernelSchedule for fp8 (grouped) gemm. (#10491 )	2025-09-16 02:47:37 -07:00
Yineng Zhang	5207424014	chore: bump v0.3.10 sgl-kernel (#10478 )	2025-09-15 15:20:09 -07:00
fzyzcjy	3b25dc127a	[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473 )	2025-09-15 11:53:21 -07:00
Lianmin Zheng	50dc0c1e9c	Run tests based on labels (#10456 )	2025-09-15 00:29:20 -07:00
fzyzcjy	ca63f075b7	Revert "Fix FA4 import cause moe_fused_gate output be illegal memory" (#10432 )	2025-09-14 19:03:27 -07:00
fzyzcjy	258d02c86d	Fix correction bias undefined behavior for nvfp4 models (#10426 )	2025-09-14 18:41:09 -07:00
Yineng Zhang	55025b9282	fix: use latest flashinfer (#10428 )	2025-09-14 15:07:14 -07:00
Lianmin Zheng	c9ec4cae5b	Fix the style of sgl kernel (#10398 )	2025-09-12 22:20:21 -07:00
fzyzcjy	3a77c80b26	Fix FA4 import cause moe_fused_gate output be illegal memory (#10368 )	2025-09-12 03:21:26 -07:00

1 2 3 4 5 ...

572 Commits