Commit Graph

65 Commits

Author SHA1 Message Date
Lianmin Zheng
2c7f01bc89 Reorganize CI and test files (#9027) 2025-08-10 12:30:06 -07:00
Yineng Zhang
8e8545caf6 fix: update cmake (#8817) 2025-08-05 09:38:30 -07:00
Qiaolin Yu
fc8c8e5041 Integrate triton_kernels in sgl-kernel (#8762) 2025-08-04 12:12:14 -07:00
Baizhou Zhang
91e3d1542e Update Cutlass in sgl-kernel to v4.1 (#8392) 2025-07-27 00:36:15 -07:00
Yineng Zhang
4c605235aa fix: workaround for deepgemm warmup issue (#8302) 2025-07-23 12:01:51 -07:00
Baizhou Zhang
282eb59ff3 Add bf16 output option for dsv3_router_gemm kernel (#7999) 2025-07-20 09:49:37 +08:00
ykcombat
1ebec1a8b0 [Feature] CUDA Green Context Support (#7649) 2025-07-15 02:49:16 +08:00
SijiaYang
da3890e82a [1/n]: add cutlass W4A8 moe kernel for hopper architecture (#7772)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Co-authored-by: yicwang <yichen.wang@bytedance.com>
2025-07-04 20:50:12 -07:00
AniZpZ
8e03b641ba [1/n] apply wna16marlin kernel in moe weight only quantization (#7683)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: 弋云 <yiyun.wyt@antgroup.com>
Co-authored-by: walker-ai <2398833647@qq.com>
2025-07-01 23:21:25 -07:00
Baizhou Zhang
7248272ccc Add dsv3 router gemm kernel (#7627) 2025-06-29 23:31:55 -07:00
Ke Bao
04b35190e2 Add dsv3 fused a gemm to sgl-kernel (#7630) 2025-06-29 02:52:24 -07:00
Ruihang Lai
16d76b9f23 [CMake] Fix sgl-kernel CMakeLists for Blackwell (#7543) 2025-06-25 19:00:46 -07:00
Zhiqiang Xie
34c3f9b2d3 kvcache io kernels and test case (#7382) 2025-06-23 11:58:59 -07:00
Lianmin Zheng
55e03b10c4 Fix a bug in BatchTokenIDOut & Misc style and dependency updates (#7457) 2025-06-23 06:20:39 -07:00
Yineng Zhang
7046e0fab7 feat: update blackwell setup (#7119) 2025-06-12 01:54:40 -07:00
Yuan Luo
84727a5139 [sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul (#6919)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-06-11 20:43:08 -07:00
JieXin Liang
22fe787852 [sgl-kernel] update deepgemm (#6942) 2025-06-06 23:24:41 -07:00
zyksir
8e3797be1c support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277) 2025-06-04 22:11:24 -07:00
Pavani Majety
eb38c7d1ca [1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-06-02 13:48:03 -07:00
Yuan Luo
55444ed667 [EP] Add cuda kernel for moe_ep_pre_reorder (#6699)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-06-01 20:49:01 -07:00
Qiaolin Yu
0b9557fcd7 Disable compiling arch below sm_90 in aarch64 by default (#6380) 2025-05-27 15:50:02 -07:00
HandH1998
4d643f6c7a [1/2] Support Qserve (#6457)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-05-21 19:48:59 -07:00
Elfie Guo
6fc9357503 [2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694) 2025-05-16 13:14:07 -07:00
Elfie Guo
c23a7072b6 Upgrade CUTLASS 4.0 (#6336)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-05-15 17:42:23 -07:00
Yineng Zhang
213e8c7dd5 chore: upgrade deepgemm (#6073) 2025-05-11 02:17:24 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Yineng Zhang
6f56614734 chore: upgrade cutlass 3.9.2 (#6004)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-06 13:34:08 -07:00
Johnny
9f21e75453 add Thor & Spark (#5915) 2025-04-30 19:43:40 -07:00
PGFLMG
08acdb5c3d [Feat] Scale up fa3 kernel to sm8x arch (#5912)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-04-30 13:59:36 -07:00
zhjunqin
403b855a22 Add sm_120 for blackwell (#5903) 2025-04-29 20:45:24 -07:00
Xiaoyu Zhang
5bb0accbcf cutlass 3.9 supported to improve fp8_blockwise_gemm (#5820) 2025-04-28 21:52:36 -07:00
PGFLMG
ee71ed8a41 [Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (#5847)
Co-authored-by: sighingnow <sighingnow@gmail.com>
2025-04-28 11:03:17 -07:00
Yineng Zhang
15fabcc07f fix sgl-kernel unit tests (#5666) 2025-04-23 01:18:30 -07:00
Elfie Guo
e62c49557d [1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (#5281) 2025-04-22 22:28:20 -07:00
PGFLMG
c08a717c77 [Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-04-17 12:43:23 -07:00
Elfie Guo
85ec0440a5 Update cutlass dependency. (#5447) 2025-04-15 23:28:04 -07:00
Lianmin Zheng
838fa0f218 [minor] cleanup cmakelists.txt (#5420) 2025-04-15 07:07:07 -07:00
DefTruth
388e15c0db kernel: support slightly faster merge_state_v2 cuda kernel (#5381) 2025-04-14 21:28:23 -07:00
Yineng Zhang
6c41fcf0e4 chore: upgrade DeepGEMM (#5395) 2025-04-14 20:32:46 -07:00
Lianmin Zheng
dae7944440 minor clean up of sgl-kernel/CMakeLists.txt (#5393) 2025-04-14 18:38:44 -07:00
Yineng Zhang
b62e7e99b8 feat: adapt merge_state (#5337) 2025-04-12 21:14:04 -07:00
PGFLMG
4879e50c6d [Feat] Add sparse attn to sgl-kernel (#5327) 2025-04-12 11:36:36 -07:00
Trevor Morris
f65b8d5c89 Blackwell Cutlass MLA kernel (#5142) 2025-04-11 22:16:51 -07:00
Yineng Zhang
136b8e6afb fix: remove cublas_grouped_gemm (#5307) 2025-04-11 16:22:37 -07:00
Yineng Zhang
7074e9ca20 fix: enable fp4 compilation on cu128 (#5286) 2025-04-11 01:43:44 -07:00
Yi Zhang
bcbbf519f9 sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079) 2025-04-05 14:23:20 -07:00
yinfan98
b8b6008f47 [Fix] fix fa3 build at cu118 (#5036) 2025-04-03 11:52:35 -07:00
Zhiqiang Xie
9d0b36c47a fix deepgemm as well (#5030) 2025-04-03 02:41:37 -07:00
Yuhong Guo
7d8c0ce7ce [Build] Support build sgl-kernel with ccache (#5020) 2025-04-03 00:22:37 -07:00
Zhiqiang Xie
a2aea59b6e update cutlass tag (#5011) 2025-04-02 18:30:30 -07:00