Commit Graph

559 Commits

Author SHA1 Message Date
Lianmin Zheng
9b8ebb2798 move more files under srt/utils (#11285) 2025-10-09 16:46:15 -07:00
Mick
a3c2ea4451 fix: fix revision for sgl-flash-attn in sgl-kernel (#11327) 2025-10-08 15:50:44 -07:00
Yuan Luo
4f42c8cd3e [sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-10-07 14:31:11 +00:00
sglang-bot
8c9670375f chore: bump sgl-kernel version to 0.3.15 (#11281) 2025-10-06 18:17:51 -07:00
Yineng Zhang
fb27d38305 docs: update sgl-kernel README (#11286) 2025-10-06 17:55:22 -07:00
Lifu Huang
748f86f3de [Bug] Fix incorrect assertion in FA4 and add UT. (#11182) 2025-10-06 14:58:39 -07:00
Lianmin Zheng
d645ae90a3 Rename runner labels (#11228) 2025-10-05 18:05:41 -07:00
Lianmin Zheng
148d8d485d Update DeepGEMM repository tag to specific commit (#11229) 2025-10-05 13:47:36 -07:00
PGFLMG
1a599509cc chore: bump sgl-kernel v0.3.14.post1 (#11137) 2025-10-05 13:46:43 -07:00
DarkSharpness
e0b2d3eebe [Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-05 10:19:03 -07:00
PGFLMG
580051c5a8 chore: bump sgl-kernel v0.3.14 (#11067) 2025-09-30 02:53:24 -07:00
Xiaoyu Zhang
11965b0daf Fix sgl-kernel benchmark dead code (#11022) 2025-09-29 15:06:40 +08:00
Zhihao Zhang
24f7cb1ece [speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
2025-09-28 21:06:59 -07:00
Lifu Huang
e98d9346c7 [1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940) 2025-09-28 19:59:14 -07:00
Kangyan-Zhou
0c9174108a Unify SGL Kernel Releases (#10701) 2025-09-28 19:48:28 -07:00
Lianmin Zheng
07440f5f34 Fix FusedSetKVBufferArg in RotaryEmbedding (#11003) 2025-09-28 11:17:27 -07:00
Yuan Luo
42245551ef [sgl-kernel] Optimize concat_mla_k kernel (#10543)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
2025-09-28 23:04:22 +08:00
Lianmin Zheng
35ec2a45a8 [minor] Remove deprecated function get_ip (#10883) 2025-09-25 16:18:04 -07:00
Yuhao Yao
fe531d6f4e [Bug] Fix Issue#10215 (#10572) 2025-09-25 09:51:50 +08:00
Xiaoyu Zhang
c4e314f986 Restruct sgl-kernel benchmark (#10861) 2025-09-25 07:45:25 +08:00
Yineng Zhang
e53df7c009 chore: bump sgl-kernel v0.3.12 (#10732) 2025-09-22 14:39:25 -07:00
Qi Yuhang
0f04a5f428 Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714) 2025-09-21 17:04:27 -07:00
Yuan Luo
616a3e20df [sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-09-19 14:12:09 +08:00
Yineng Zhang
5bfafdfcb4 chore: bump sgl-kernel 0.3.11 (#10630) 2025-09-18 18:43:20 -07:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
Zaili Wang
6fd4816d9f Fix sgl_kernel import failure on devices other than CUDA (#10610) 2025-09-18 11:38:02 -07:00
EduardDurech
a77564e0fb CUDA Arch Independent (#8813) 2025-09-16 23:01:45 -07:00
cicirori
a2f7218a2e support using fa4 on deepseek on blackwell (#9928) 2025-09-16 16:16:06 -07:00
Qi Yuhang
9b876889b7 Update CUTLASS. Refine KernelSchedule for fp8 (grouped) gemm. (#10491) 2025-09-16 02:47:37 -07:00
Yineng Zhang
5207424014 chore: bump v0.3.10 sgl-kernel (#10478) 2025-09-15 15:20:09 -07:00
fzyzcjy
3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) 2025-09-15 11:53:21 -07:00
Lianmin Zheng
50dc0c1e9c Run tests based on labels (#10456) 2025-09-15 00:29:20 -07:00
fzyzcjy
ca63f075b7 Revert "Fix FA4 import cause moe_fused_gate output be illegal memory" (#10432) 2025-09-14 19:03:27 -07:00
fzyzcjy
258d02c86d Fix correction bias undefined behavior for nvfp4 models (#10426) 2025-09-14 18:41:09 -07:00
Yineng Zhang
55025b9282 fix: use latest flashinfer (#10428) 2025-09-14 15:07:14 -07:00
Lianmin Zheng
c9ec4cae5b Fix the style of sgl kernel (#10398) 2025-09-12 22:20:21 -07:00
fzyzcjy
3a77c80b26 Fix FA4 import cause moe_fused_gate output be illegal memory (#10368) 2025-09-12 03:21:26 -07:00
Hubert Lu
fe68c1486f Fix errors of hicache kernels in sgl-kernel for ROCm (#10339) 2025-09-11 14:54:34 -07:00
Yineng Zhang
532f998b0f chore: bump sgl-kernel 0.3.9.post2 (#10311) 2025-09-11 01:29:50 -07:00
Yineng Zhang
de15d1405a Revert "Fix flashinfer version in sgl-kernel (#10135)" (#10310) 2025-09-11 01:27:58 -07:00
Yineng Zhang
5b7448de77 chore: bump sgl-kernel 0.3.9.post1 (#10294) 2025-09-10 18:26:34 -07:00
Yineng Zhang
6d55f60e77 Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292) 2025-09-10 18:24:23 -07:00
Rain Jiang
2286e85e77 pass a_scale from fp8 quant result instead of hard code to 1.0f (#10241)
Co-authored-by: Yichen Wang <yichen.wang@bytedance.com>
Co-authored-by: Jinwu Guo <641876696@qq.com>
2025-09-10 12:56:05 -07:00
huangtingwei
5be8c2f7f7 Page first direct IO kernel (#10060)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-10 13:35:34 +08:00
Yi Zhang
8cbe1538ef Add mamba kernel (#10234) 2025-09-09 12:58:43 -07:00
Yineng Zhang
f3817cb0b2 chore: bump v0.3.9 sgl-kernel (#10208) 2025-09-09 01:40:05 -07:00
Yineng Zhang
94fb4e9e54 feat: support fa cute in sgl-kernel (#10205)
Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>
2025-09-09 00:14:39 -07:00
blzheng
d1d4074c4e [CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300) 2025-09-08 23:23:13 -07:00
Keyang Ru
718f25ae6e Explicitly export CMAKE_BUILD_PARALLEL_LEVEL (#10193) 2025-09-08 22:35:27 -07:00
Yineng Zhang
cdc56ef6c1 feat: use sgl-kernel cu129 as default (#10188) 2025-09-08 22:01:17 -07:00