Commit Graph

15 Commits

Author SHA1 Message Date
HandH1998
4d643f6c7a [1/2] Support Qserve (#6457)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-05-21 19:48:59 -07:00
Elfie Guo
6fc9357503 [2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694) 2025-05-16 13:14:07 -07:00
PGFLMG
ee71ed8a41 [Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (#5847)
Co-authored-by: sighingnow <sighingnow@gmail.com>
2025-04-28 11:03:17 -07:00
Yineng Zhang
15fabcc07f fix sgl-kernel unit tests (#5666) 2025-04-23 01:18:30 -07:00
Elfie Guo
e62c49557d [1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (#5281) 2025-04-22 22:28:20 -07:00
Xiaoyu Zhang
8e09b37077 Sgl kernel fused_moe_gate support n_shared_experts (#5440) 2025-04-17 23:05:15 -07:00
PGFLMG
c08a717c77 [Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-04-17 12:43:23 -07:00
DefTruth
388e15c0db kernel: support slightly faster merge_state_v2 cuda kernel (#5381) 2025-04-14 21:28:23 -07:00
Yineng Zhang
b62e7e99b8 feat: adapt merge_state (#5337) 2025-04-12 21:14:04 -07:00
PGFLMG
4879e50c6d [Feat] Add sparse attn to sgl-kernel (#5327) 2025-04-12 11:36:36 -07:00
Trevor Morris
f65b8d5c89 Blackwell Cutlass MLA kernel (#5142) 2025-04-11 22:16:51 -07:00
Yineng Zhang
136b8e6afb fix: remove cublas_grouped_gemm (#5307) 2025-04-11 16:22:37 -07:00
Richard Zou
76f44c2a8d Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213) 2025-04-10 09:14:38 -07:00
Yi Zhang
bcbbf519f9 sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079) 2025-04-05 14:23:20 -07:00
yinfan98
b8b6008f47 [Fix] fix fa3 build at cu118 (#5036) 2025-04-03 11:52:35 -07:00