maxiao
|
8f7453e3af
|
adapt to ds3.2
|
2025-09-30 17:44:54 +08:00 |
|
Xiaoyu Zhang
|
11965b0daf
|
Fix sgl-kernel benchmark dead code (#11022)
|
2025-09-29 15:06:40 +08:00 |
|
Zhihao Zhang
|
24f7cb1ece
|
[speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
|
2025-09-28 21:06:59 -07:00 |
|
Lifu Huang
|
e98d9346c7
|
[1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940)
|
2025-09-28 19:59:14 -07:00 |
|
Kangyan-Zhou
|
0c9174108a
|
Unify SGL Kernel Releases (#10701)
|
2025-09-28 19:48:28 -07:00 |
|
Lianmin Zheng
|
07440f5f34
|
Fix FusedSetKVBufferArg in RotaryEmbedding (#11003)
|
2025-09-28 11:17:27 -07:00 |
|
Yuan Luo
|
42245551ef
|
[sgl-kernel] Optimize concat_mla_k kernel (#10543)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
|
2025-09-28 23:04:22 +08:00 |
|
Lianmin Zheng
|
35ec2a45a8
|
[minor] Remove deprecated function get_ip (#10883)
|
2025-09-25 16:18:04 -07:00 |
|
Yuhao Yao
|
fe531d6f4e
|
[Bug] Fix Issue#10215 (#10572)
|
2025-09-25 09:51:50 +08:00 |
|
Xiaoyu Zhang
|
c4e314f986
|
Restruct sgl-kernel benchmark (#10861)
|
2025-09-25 07:45:25 +08:00 |
|
Yineng Zhang
|
e53df7c009
|
chore: bump sgl-kernel v0.3.12 (#10732)
|
2025-09-22 14:39:25 -07:00 |
|
Qi Yuhang
|
0f04a5f428
|
Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714)
|
2025-09-21 17:04:27 -07:00 |
|
Yuan Luo
|
616a3e20df
|
[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-09-19 14:12:09 +08:00 |
|
Yineng Zhang
|
5bfafdfcb4
|
chore: bump sgl-kernel 0.3.11 (#10630)
|
2025-09-18 18:43:20 -07:00 |
|
Zhihao Zhang
|
e7bc600304
|
[Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-18 16:42:41 -07:00 |
|
Zaili Wang
|
6fd4816d9f
|
Fix sgl_kernel import failure on devices other than CUDA (#10610)
|
2025-09-18 11:38:02 -07:00 |
|
EduardDurech
|
a77564e0fb
|
CUDA Arch Independent (#8813)
|
2025-09-16 23:01:45 -07:00 |
|
cicirori
|
a2f7218a2e
|
support using fa4 on deepseek on blackwell (#9928)
|
2025-09-16 16:16:06 -07:00 |
|
Qi Yuhang
|
9b876889b7
|
Update CUTLASS. Refine KernelSchedule for fp8 (grouped) gemm. (#10491)
|
2025-09-16 02:47:37 -07:00 |
|
Yineng Zhang
|
5207424014
|
chore: bump v0.3.10 sgl-kernel (#10478)
|
2025-09-15 15:20:09 -07:00 |
|
fzyzcjy
|
3b25dc127a
|
[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473)
|
2025-09-15 11:53:21 -07:00 |
|
Lianmin Zheng
|
50dc0c1e9c
|
Run tests based on labels (#10456)
|
2025-09-15 00:29:20 -07:00 |
|
fzyzcjy
|
ca63f075b7
|
Revert "Fix FA4 import cause moe_fused_gate output be illegal memory" (#10432)
|
2025-09-14 19:03:27 -07:00 |
|
fzyzcjy
|
258d02c86d
|
Fix correction bias undefined behavior for nvfp4 models (#10426)
|
2025-09-14 18:41:09 -07:00 |
|
Yineng Zhang
|
55025b9282
|
fix: use latest flashinfer (#10428)
|
2025-09-14 15:07:14 -07:00 |
|
Lianmin Zheng
|
c9ec4cae5b
|
Fix the style of sgl kernel (#10398)
|
2025-09-12 22:20:21 -07:00 |
|
fzyzcjy
|
3a77c80b26
|
Fix FA4 import cause moe_fused_gate output be illegal memory (#10368)
|
2025-09-12 03:21:26 -07:00 |
|
Hubert Lu
|
fe68c1486f
|
Fix errors of hicache kernels in sgl-kernel for ROCm (#10339)
|
2025-09-11 14:54:34 -07:00 |
|
Yineng Zhang
|
532f998b0f
|
chore: bump sgl-kernel 0.3.9.post2 (#10311)
|
2025-09-11 01:29:50 -07:00 |
|
Yineng Zhang
|
de15d1405a
|
Revert "Fix flashinfer version in sgl-kernel (#10135)" (#10310)
|
2025-09-11 01:27:58 -07:00 |
|
Yineng Zhang
|
5b7448de77
|
chore: bump sgl-kernel 0.3.9.post1 (#10294)
|
2025-09-10 18:26:34 -07:00 |
|
Yineng Zhang
|
6d55f60e77
|
Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292)
|
2025-09-10 18:24:23 -07:00 |
|
Rain Jiang
|
2286e85e77
|
pass a_scale from fp8 quant result instead of hard code to 1.0f (#10241)
Co-authored-by: Yichen Wang <yichen.wang@bytedance.com>
Co-authored-by: Jinwu Guo <641876696@qq.com>
|
2025-09-10 12:56:05 -07:00 |
|
huangtingwei
|
5be8c2f7f7
|
Page first direct IO kernel (#10060)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-09-10 13:35:34 +08:00 |
|
Yi Zhang
|
8cbe1538ef
|
Add mamba kernel (#10234)
|
2025-09-09 12:58:43 -07:00 |
|
Yineng Zhang
|
f3817cb0b2
|
chore: bump v0.3.9 sgl-kernel (#10208)
|
2025-09-09 01:40:05 -07:00 |
|
Yineng Zhang
|
94fb4e9e54
|
feat: support fa cute in sgl-kernel (#10205)
Co-authored-by: cicirori <32845984+cicirori@users.noreply.github.com>
|
2025-09-09 00:14:39 -07:00 |
|
blzheng
|
d1d4074c4e
|
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300)
|
2025-09-08 23:23:13 -07:00 |
|
Keyang Ru
|
718f25ae6e
|
Explicitly export CMAKE_BUILD_PARALLEL_LEVEL (#10193)
|
2025-09-08 22:35:27 -07:00 |
|
Yineng Zhang
|
cdc56ef6c1
|
feat: use sgl-kernel cu129 as default (#10188)
|
2025-09-08 22:01:17 -07:00 |
|
fzyzcjy
|
0096798ed6
|
[1/2] Speed up prefill mla attention (#10156)
|
2025-09-08 09:00:33 -07:00 |
|
Yuhao Yao
|
ee0b3c5bad
|
[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel, fixed) (#10108)
|
2025-09-07 21:39:07 -07:00 |
|
Rain Jiang
|
6049ca209e
|
move compile threads to an option to avoid OOM on low memory host (#10123)
|
2025-09-07 21:36:14 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
Lianmin Zheng
|
76a2c86b88
|
Fix flashinfer version in sgl-kernel (#10135)
|
2025-09-07 12:54:07 -07:00 |
|
Qi Yuhang
|
85ed8e0a5e
|
Optimize nvfp4 block scaled gemm kernel when M is small. (#10101)
|
2025-09-06 22:31:00 -07:00 |
|
Jianying
|
dd1e268938
|
CUTLASS fp8 blockwise gemm support of sm120 (#9969)
|
2025-09-06 22:28:54 -07:00 |
|
hlu1
|
5f1eb20484
|
[chore] Remove unused ep_moe cuda kernels (#9956)
|
2025-09-06 01:35:50 -07:00 |
|
hlu1
|
039cef76aa
|
Remove non-accelerated targets(100 and up) from cmake (#10041)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-09-06 01:35:28 -07:00 |
|
hlu1
|
4c22ebe2e8
|
Disable kernel cutlass_mla_decode on SM103 (#10058)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-09-06 01:35:18 -07:00 |
|