Commit Graph

201 Commits

Author SHA1 Message Date
sglang-bot
283c8ba031 chore: bump sgl-kernel version to 0.3.16.post3 (#11733) 2025-10-19 21:44:15 -05:00
hlu1
3b80232d06 [DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-10-19 17:13:39 -07:00
Johnny
252dc4e112 [NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-19 17:10:10 -07:00
fzyzcjy
a27825ae01 Support not officially supported high sgl-kernel version with low srt version (#11786) 2025-10-19 16:11:59 +08:00
Fan Yin
3289da5b41 [sgl-kernel] support hadamard (#11663) 2025-10-15 19:00:44 -07:00
Qi Yuhang
6c01844f45 [sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674) 2025-10-15 13:39:31 -07:00
fzyzcjy
32803fb279 Super tiny improve FA3 import error message (#11590) 2025-10-14 22:06:31 -07:00
sglang-bot
98923880bc chore: bump sgl-kernel version to 0.3.16.post2 (#11583)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-13 20:52:38 -07:00
Yineng Zhang
f792e3c561 Revert "[NVIDIA] BUMP FA3 (#11444)" (#11582) 2025-10-13 20:51:45 -07:00
sglang-bot
60b0503227 chore: bump sgl-kernel version to 0.3.16.post1 (#11573)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-13 16:26:18 -07:00
Qi Yuhang
dc48c4c0e3 [sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534) 2025-10-13 16:24:48 -07:00
Johnny
b8c430f1ce [NVIDIA] BUMP FA3 (#11444)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
2025-10-13 09:30:57 -07:00
Qi Yuhang
9a30914e94 [sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-10-12 20:19:21 -07:00
sglang-bot
2db2cddd12 chore: bump sgl-kernel version to 0.3.16 (#11476) 2025-10-11 22:04:49 -07:00
PGFLMG
8fdcd98efe [7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019) 2025-10-11 14:04:57 -07:00
fzyzcjy
21337b22b9 Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-10-11 15:59:03 +08:00
sglang-bot
8c9670375f chore: bump sgl-kernel version to 0.3.15 (#11281) 2025-10-06 18:17:51 -07:00
Lifu Huang
748f86f3de [Bug] Fix incorrect assertion in FA4 and add UT. (#11182) 2025-10-06 14:58:39 -07:00
PGFLMG
1a599509cc chore: bump sgl-kernel v0.3.14.post1 (#11137) 2025-10-05 13:46:43 -07:00
DarkSharpness
e0b2d3eebe [Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-05 10:19:03 -07:00
PGFLMG
580051c5a8 chore: bump sgl-kernel v0.3.14 (#11067) 2025-09-30 02:53:24 -07:00
Lifu Huang
e98d9346c7 [1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940) 2025-09-28 19:59:14 -07:00
Kangyan-Zhou
0c9174108a Unify SGL Kernel Releases (#10701) 2025-09-28 19:48:28 -07:00
Lianmin Zheng
07440f5f34 Fix FusedSetKVBufferArg in RotaryEmbedding (#11003) 2025-09-28 11:17:27 -07:00
Lianmin Zheng
35ec2a45a8 [minor] Remove deprecated function get_ip (#10883) 2025-09-25 16:18:04 -07:00
Yineng Zhang
e53df7c009 chore: bump sgl-kernel v0.3.12 (#10732) 2025-09-22 14:39:25 -07:00
Yuan Luo
616a3e20df [sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-09-19 14:12:09 +08:00
Yineng Zhang
5bfafdfcb4 chore: bump sgl-kernel 0.3.11 (#10630) 2025-09-18 18:43:20 -07:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
Zaili Wang
6fd4816d9f Fix sgl_kernel import failure on devices other than CUDA (#10610) 2025-09-18 11:38:02 -07:00
EduardDurech
a77564e0fb CUDA Arch Independent (#8813) 2025-09-16 23:01:45 -07:00
cicirori
a2f7218a2e support using fa4 on deepseek on blackwell (#9928) 2025-09-16 16:16:06 -07:00
Yineng Zhang
5207424014 chore: bump v0.3.10 sgl-kernel (#10478) 2025-09-15 15:20:09 -07:00
fzyzcjy
3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) 2025-09-15 11:53:21 -07:00
fzyzcjy
ca63f075b7 Revert "Fix FA4 import cause moe_fused_gate output be illegal memory" (#10432) 2025-09-14 19:03:27 -07:00
Lianmin Zheng
c9ec4cae5b Fix the style of sgl kernel (#10398) 2025-09-12 22:20:21 -07:00
fzyzcjy
3a77c80b26 Fix FA4 import cause moe_fused_gate output be illegal memory (#10368) 2025-09-12 03:21:26 -07:00
Yineng Zhang
532f998b0f chore: bump sgl-kernel 0.3.9.post2 (#10311) 2025-09-11 01:29:50 -07:00
Yineng Zhang
5b7448de77 chore: bump sgl-kernel 0.3.9.post1 (#10294) 2025-09-10 18:26:34 -07:00
Yineng Zhang
6d55f60e77 Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292) 2025-09-10 18:24:23 -07:00
huangtingwei
5be8c2f7f7 Page first direct IO kernel (#10060)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-10 13:35:34 +08:00
Yi Zhang
8cbe1538ef Add mamba kernel (#10234) 2025-09-09 12:58:43 -07:00
Yineng Zhang
f3817cb0b2 chore: bump v0.3.9 sgl-kernel (#10208) 2025-09-09 01:40:05 -07:00
Yineng Zhang
94fb4e9e54 feat: support fa cute in sgl-kernel (#10205)
Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>
2025-09-09 00:14:39 -07:00
fzyzcjy
0096798ed6 [1/2] Speed up prefill mla attention (#10156) 2025-09-08 09:00:33 -07:00
hlu1
5f1eb20484 [chore] Remove unused ep_moe cuda kernels (#9956) 2025-09-06 01:35:50 -07:00
fzyzcjy
bd7f882142 Support copying tensor from cpu to gpu without using copy engines (#10007) 2025-09-05 20:07:19 +08:00
fzyzcjy
339f8eef09 [1/2] Optimizations and refactors about quant kernel (#9534) 2025-09-05 18:45:08 +08:00
Yineng Zhang
a96c5b5c14 chore: bump v0.3.8 sgl-kernel (#9907) 2025-09-02 01:27:26 -07:00
Yineng Zhang
c5082f0f73 chore: fix cuda driver api issue and bump sgl-kernel 0.3.7.post1 (#9746) 2025-08-30 02:01:54 -07:00