Chunyuan WU
|
7eb47b0f3d
|
[CPU] [BF16] Call fused_experts_cpu, weight_packed_linear and bmm_cpu kernel in DeepSeek model (#6641)
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-06-25 01:43:33 -07:00 |
|
Ke Bao
|
57ab776910
|
Fuse sorted_token_ids padding to moe_align_block_size kernel (#7437)
|
2025-06-24 17:44:27 -07:00 |
|
Zhiqiang Xie
|
34c3f9b2d3
|
kvcache io kernels and test case (#7382)
|
2025-06-23 11:58:59 -07:00 |
|
AniZpZ
|
3eb4a800e8
|
Fix AWQ Dequant and Weight Loading of deepseek v2 (#6842)
|
2025-06-17 13:45:10 -07:00 |
|
Lianmin Zheng
|
cfceb83d05
|
Fix sampling for speculative decoding & simplify kernels (#7207)
|
2025-06-16 03:28:30 -07:00 |
|
JieXin Liang
|
ab1a4fa5cb
|
[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla (#7184)
|
2025-06-14 12:45:41 -07:00 |
|
fzyzcjy
|
5c66c4424f
|
Support new DeepGEMM format in per token group quant (#7146)
|
2025-06-13 02:00:22 -07:00 |
|
fzyzcjy
|
aa46ed34d2
|
Remove 200us slow concat kernel (part 1: kernel) (#7145)
|
2025-06-13 01:58:29 -07:00 |
|
Yuan Luo
|
84727a5139
|
[sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul (#6919)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-11 20:43:08 -07:00 |
|
fzyzcjy
|
19995dd78e
|
Tiny fix cutlass_mla_get_workspace_size stub incorrect signature (#7057)
|
2025-06-10 12:27:57 -07:00 |
|
YanbingJiang
|
fcde67b016
|
CPU: map changes from developing branch in sgl-kernel (#6833)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-06-10 01:08:15 -07:00 |
|
JieXin Liang
|
18efb5e8e0
|
[perf][sgl-kernel] extend cutlass_mla_decode to support num_head < 128 (#6929)
|
2025-06-08 19:37:34 -07:00 |
|
Elfie Guo
|
3e56f557fd
|
Add a CUDA kernel for fusing mapping and weighted sum for MoE. (#6916)
Co-authored-by: Elfie Guo <elfiegxf@gmail.com>
|
2025-06-07 15:24:39 -07:00 |
|
Xiaoyu Zhang
|
8b5f83ed3b
|
reduce torch.zeros overhead in moe align block size kernel (#6369)
|
2025-06-07 02:47:36 -07:00 |
|
Yuan Luo
|
43baba649e
|
[EP] Add cuda kernel for moe_ep_post_reorder (#6837)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-05 00:33:47 -07:00 |
|
zyksir
|
8e3797be1c
|
support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277)
|
2025-06-04 22:11:24 -07:00 |
|
Cheng Wan
|
81964328b7
|
Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736)
|
2025-06-04 15:53:22 -07:00 |
|
Xiaoyu Zhang
|
bd75690f4e
|
fix ep_moe_reorder kernel bugs (#6858)
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
|
2025-06-04 19:13:59 +08:00 |
|
Cheng Wan
|
8a5480528d
|
[Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735)
|
2025-06-03 17:48:24 -07:00 |
|
jianan-gu
|
ff00895c46
|
Add CPU optimized kernels for topk and rope fusions (#6456)
|
2025-06-02 17:37:34 -07:00 |
|
Pavani Majety
|
eb38c7d1ca
|
[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-06-02 13:48:03 -07:00 |
|
Yuan Luo
|
55444ed667
|
[EP] Add cuda kernel for moe_ep_pre_reorder (#6699)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-01 20:49:01 -07:00 |
|
Chunyuan WU
|
3ded6235c9
|
Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404)
|
2025-05-23 02:01:55 -07:00 |
|
blzheng
|
4ba1eea83f
|
Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493)
|
2025-05-23 00:14:46 -07:00 |
|
HandH1998
|
4d643f6c7a
|
[1/2] Support Qserve (#6457)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-05-21 19:48:59 -07:00 |
|
blzheng
|
cfe48c5902
|
[CPU] Fix build issue (#6419)
|
2025-05-21 11:17:10 -07:00 |
|
YanbingJiang
|
32cc66efa5
|
Update extend/decode attention kernel for CPU in sgl-kernel and add UTs (#6405)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-19 21:23:17 -07:00 |
|
Chunyuan WU
|
5dd62c3a6f
|
Add fp8 shared_expert kernel for CPU in sgl-kernel and add UT (#6339)
Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-18 12:42:15 -07:00 |
|
Elfie Guo
|
6fc9357503
|
[2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694)
|
2025-05-16 13:14:07 -07:00 |
|
Elfie Guo
|
c23a7072b6
|
Upgrade CUTLASS 4.0 (#6336)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-05-15 17:42:23 -07:00 |
|
Chunyuan WU
|
fb4959b2c5
|
Add fp8 gemm kernel for CPU in sgl-kernel and add gemm UT (#6216)
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-15 09:10:40 -07:00 |
|
blzheng
|
0f75b907c6
|
[CPU] Add CMakeLists.txt for sgl-kernel (#6115)
|
2025-05-13 15:30:37 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Trevor Morris
|
0ab3f437ab
|
Cutlass MLA: Disable split kv due to https://github.com/NVIDIA/cutlass/issues/2274 (#6101)
|
2025-05-08 18:44:30 -07:00 |
|
PGFLMG
|
f6f96b0521
|
[sgl-kernel] fix: fix cu118 compile error (#6123)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-05-08 14:26:51 -07:00 |
|
Xiaoyu Zhang
|
d25398cbc8
|
fix custom_allreduce namespace (#6039)
|
2025-05-06 19:13:06 -07:00 |
|
Yineng Zhang
|
6f56614734
|
chore: upgrade cutlass 3.9.2 (#6004)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-06 13:34:08 -07:00 |
|
Xiaoyu Zhang
|
5bb0accbcf
|
cutlass 3.9 supported to improve fp8_blockwise_gemm (#5820)
|
2025-04-28 21:52:36 -07:00 |
|
PGFLMG
|
ee71ed8a41
|
[Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (#5847)
Co-authored-by: sighingnow <sighingnow@gmail.com>
|
2025-04-28 11:03:17 -07:00 |
|
Yineng Zhang
|
ce4ecba477
|
fix: only compile ApplyTokenBitmaskInplace cu124+ (#5686)
|
2025-04-23 14:17:42 -07:00 |
|
Yineng Zhang
|
15fabcc07f
|
fix sgl-kernel unit tests (#5666)
|
2025-04-23 01:18:30 -07:00 |
|
Elfie Guo
|
e62c49557d
|
[1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (#5281)
|
2025-04-22 22:28:20 -07:00 |
|
Xiaoyu Zhang
|
8e09b37077
|
Sgl kernel fused_moe_gate support n_shared_experts (#5440)
|
2025-04-17 23:05:15 -07:00 |
|
PGFLMG
|
c08a717c77
|
[Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-17 12:43:23 -07:00 |
|
DefTruth
|
12ef7e3bc3
|
bugfix: fix merge_state_v2 cuda graph (#5419)
|
2025-04-15 10:18:47 -07:00 |
|
DefTruth
|
388e15c0db
|
kernel: support slightly faster merge_state_v2 cuda kernel (#5381)
|
2025-04-14 21:28:23 -07:00 |
|
Yineng Zhang
|
b62e7e99b8
|
feat: adapt merge_state (#5337)
|
2025-04-12 21:14:04 -07:00 |
|
Yineng Zhang
|
812e82f35e
|
fix: solve cu118 issue for cutlass mla (#5331)
|
2025-04-12 12:51:09 -07:00 |
|