Xiaoyu Zhang
|
8e09b37077
|
Sgl kernel fused_moe_gate support n_shared_experts (#5440)
|
2025-04-17 23:05:15 -07:00 |
|
PGFLMG
|
c08a717c77
|
[Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-17 12:43:23 -07:00 |
|
DefTruth
|
388e15c0db
|
kernel: support slightly faster merge_state_v2 cuda kernel (#5381)
|
2025-04-14 21:28:23 -07:00 |
|
Yineng Zhang
|
b62e7e99b8
|
feat: adapt merge_state (#5337)
|
2025-04-12 21:14:04 -07:00 |
|
PGFLMG
|
4879e50c6d
|
[Feat] Add sparse attn to sgl-kernel (#5327)
|
2025-04-12 11:36:36 -07:00 |
|
Trevor Morris
|
f65b8d5c89
|
Blackwell Cutlass MLA kernel (#5142)
|
2025-04-11 22:16:51 -07:00 |
|
Yineng Zhang
|
136b8e6afb
|
fix: remove cublas_grouped_gemm (#5307)
|
2025-04-11 16:22:37 -07:00 |
|
Richard Zou
|
76f44c2a8d
|
Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213)
|
2025-04-10 09:14:38 -07:00 |
|
Yi Zhang
|
bcbbf519f9
|
sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079)
|
2025-04-05 14:23:20 -07:00 |
|
yinfan98
|
b8b6008f47
|
[Fix] fix fa3 build at cu118 (#5036)
|
2025-04-03 11:52:35 -07:00 |
|