hlu1
|
7a16db9bd9
|
Make sm100 fp8 kernels available on sm103 (#9789)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-08-28 23:47:29 -07:00 |
|
Rain Jiang
|
6b39f9cf8c
|
Support compile sgl-kernel on cuda 13.0 (#9721)
|
2025-08-28 10:18:03 -07:00 |
|
PGFLMG
|
aa3eba8eb4
|
[sgl-kernel] misc: update deepgemm version for sgl-kernel (#9340)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
|
2025-08-27 12:01:30 -07:00 |
|
Rain Jiang
|
79e6a8a6ac
|
support cuda 13.0 and trtllm kernel by Aug 25 2025 (#9495)
|
2025-08-26 23:13:27 -07:00 |
|
Qi Yuhang
|
fda4792620
|
Update CUTLASS 4.2 & Enable K-Major Scale Factor for SM90 FP8 Blockwise Group GEMM (#9559)
|
2025-08-24 23:24:43 -07:00 |
|
kousakawang
|
5fd311d33e
|
[code clean] add H20 cutlass groupGemm default config (#9333)
Co-authored-by: wanghanpei <wanghanpei@bytedance.com>
|
2025-08-21 19:23:29 -07:00 |
|
Hubert Lu
|
c6c379ab31
|
[AMD] Reorganize hip-related header files in sgl-kernel (#9320)
|
2025-08-18 16:53:44 -07:00 |
|
kousakawang
|
0fc54b971e
|
[fix]: fix cutlass moe ut and and Opt H20 cutlass groupGemm performance (#9272)
Co-authored-by: wanghanpei <wanghanpei@bytedance.com>
|
2025-08-17 13:09:49 -07:00 |
|
Peng Zhang
|
5aa1ebd242
|
[2/n]decouple quantization implementation from vLLM dependency (#8112)
Co-authored-by: walker-ai <yiyun.wyt@antgroup.com>
Co-authored-by: leoneo <1320612015@qq.com>
|
2025-08-14 03:19:03 -07:00 |
|
Trevor Morris
|
13c48dcf88
|
[1/2][resubmit again] sgl-kernel: Fuse routed scaling factor into moe_fused_gate (#9088)
|
2025-08-12 20:12:38 -07:00 |
|
Yineng Zhang
|
dd949ace23
|
Revert "[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into m… (#9035)
|
2025-08-10 17:34:54 -07:00 |
|
Trevor Morris
|
591c232f7c
|
[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into moe_fused_gate (select_experts) (#8770)
|
2025-08-08 17:55:06 -07:00 |
|
Qi Yuhang
|
d9def43dcd
|
[Perf]Use Cooperative Schedule for H100 & H200 & H800 in fp8_blockwise_scaled_grouped_mm (#8722)
|
2025-08-02 21:13:47 -07:00 |
|
Liangsheng Yin
|
f9f0138f80
|
Revert "[1/2] sgl-kernel: Fuse routed scaling factor into select_experts" (#8706)
|
2025-08-02 20:14:30 +08:00 |
|
Trevor Morris
|
f642524fd9
|
[1/2] sgl-kernel: Fuse routed scaling factor into select_experts (#8364)
|
2025-08-01 18:14:24 -07:00 |
|
Cheng Wan
|
a5f5ab4030
|
update sgl-kernel for EP: kernel part (#8514)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
|
2025-07-30 22:19:55 -07:00 |
|
Qi Yuhang
|
9b9e82539b
|
[Fix]Fix index oob in get_group_gemm_starts kernel. (#8564)
|
2025-07-30 19:49:35 -07:00 |
|
Yuan Luo
|
3bdcdd134b
|
[Hot-Fix] moe_aligned_block_size CI failed in AMD (#8461)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
|
2025-07-31 00:28:32 +08:00 |
|
Xiaoyu Zhang
|
2262369905
|
Revert "[kernel] opt moe align block kernel by block/warp scan algorithm" (#8457)
|
2025-07-28 01:35:43 -07:00 |
|
Yuan Luo
|
af1cc8fe2d
|
[kernel] opt moe align block kernel by block/warp scan algorithm (#7884)
|
2025-07-17 19:33:02 +08:00 |
|
Ke Bao
|
a3398d8478
|
Optimize moe align block size kernel (#7794)
|
2025-07-07 09:20:30 +08:00 |
|
Mick
|
c797322280
|
fix: fix apply_shuffle_mul_sum (#7444)
|
2025-07-04 23:23:30 -07:00 |
|
Qi Yuhang
|
8e9fb43d82
|
Optimize Hopper CUTLASS FP8 Blockwise Grouped GEMM Kernel in Small K Scenario (#7782)
|
2025-07-04 22:25:49 -07:00 |
|
SijiaYang
|
da3890e82a
|
[1/n]: add cutlass W4A8 moe kernel for hopper architecture (#7772)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Co-authored-by: yicwang <yichen.wang@bytedance.com>
|
2025-07-04 20:50:12 -07:00 |
|
Yi Zhang
|
2998c4bdf4
|
[optimize] fuse renormalize into moe_topk_softmax (#7744)
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2025-07-03 12:42:44 -07:00 |
|
ayrnb
|
2c4feaf308
|
Add CUTLASS FP8 Blockscale MoE kernel for Hopper architecture (#7278)
Co-authored-by: HydraQYH <QYH820@Outlook.com>
Co-authored-by: TianQiLin666666 <1834987979@qq.com>
|
2025-07-02 23:27:03 -07:00 |
|
AniZpZ
|
8e03b641ba
|
[1/n] apply wna16marlin kernel in moe weight only quantization (#7683)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: 弋云 <yiyun.wyt@antgroup.com>
Co-authored-by: walker-ai <2398833647@qq.com>
|
2025-07-01 23:21:25 -07:00 |
|
Ke Bao
|
57ab776910
|
Fuse sorted_token_ids padding to moe_align_block_size kernel (#7437)
|
2025-06-24 17:44:27 -07:00 |
|
Yuan Luo
|
84727a5139
|
[sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul (#6919)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-11 20:43:08 -07:00 |
|
Elfie Guo
|
3e56f557fd
|
Add a CUDA kernel for fusing mapping and weighted sum for MoE. (#6916)
Co-authored-by: Elfie Guo <elfiegxf@gmail.com>
|
2025-06-07 15:24:39 -07:00 |
|
Xiaoyu Zhang
|
8b5f83ed3b
|
reduce torch.zeros overhead in moe align block size kernel (#6369)
|
2025-06-07 02:47:36 -07:00 |
|
Yuan Luo
|
43baba649e
|
[EP] Add cuda kernel for moe_ep_post_reorder (#6837)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-05 00:33:47 -07:00 |
|
Cheng Wan
|
81964328b7
|
Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736)
|
2025-06-04 15:53:22 -07:00 |
|
Xiaoyu Zhang
|
bd75690f4e
|
fix ep_moe_reorder kernel bugs (#6858)
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
|
2025-06-04 19:13:59 +08:00 |
|
Cheng Wan
|
8a5480528d
|
[Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735)
|
2025-06-03 17:48:24 -07:00 |
|
Pavani Majety
|
eb38c7d1ca
|
[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-06-02 13:48:03 -07:00 |
|
Yuan Luo
|
55444ed667
|
[EP] Add cuda kernel for moe_ep_pre_reorder (#6699)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-01 20:49:01 -07:00 |
|
Elfie Guo
|
6fc9357503
|
[2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694)
|
2025-05-16 13:14:07 -07:00 |
|
Elfie Guo
|
e62c49557d
|
[1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (#5281)
|
2025-04-22 22:28:20 -07:00 |
|
Xiaoyu Zhang
|
8e09b37077
|
Sgl kernel fused_moe_gate support n_shared_experts (#5440)
|
2025-04-17 23:05:15 -07:00 |
|
Xiaoyu Zhang
|
f730362ee2
|
reduce moe_align_block_size_kernel small batch mode overhead (#5086)
|
2025-04-09 17:59:35 -07:00 |
|
Qingquan Song
|
45dcfc2e76
|
Add deepseek style fused moe group gate selection kernel (#4530)
|
2025-03-29 11:51:45 -07:00 |
|
Yineng Zhang
|
8bf6d7f406
|
support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-03-27 01:42:28 -07:00 |
|
Qingquan Song
|
61e4433caf
|
Add moe topk softmax templated from vllm (#4302)
|
2025-03-14 12:03:33 -07:00 |
|
Shi Shuai
|
817d43705c
|
feat: support ep size < 32 for sgl kernel (#4348)
|
2025-03-12 20:50:46 -07:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|