Xiaoyu Zhang
|
8e09b37077
|
Sgl kernel fused_moe_gate support n_shared_experts (#5440)
|
2025-04-17 23:05:15 -07:00 |
|
PGFLMG
|
c08a717c77
|
[Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-17 12:43:23 -07:00 |
|
Trevor Morris
|
e8f62b20ca
|
BLackwell cutlass mla: Add check for bad page size/block num combinations (#5431)
|
2025-04-15 14:07:42 -07:00 |
|
Yineng Zhang
|
6f509d5503
|
chore: bump sgl-kernel v0.0.9.post1 (#5430)
|
2025-04-15 11:00:21 -07:00 |
|
Yineng Zhang
|
e940dc4f06
|
chore: bump sgl-kernel 0.0.9 (#5400)
|
2025-04-14 21:34:04 -07:00 |
|
DefTruth
|
388e15c0db
|
kernel: support slightly faster merge_state_v2 cuda kernel (#5381)
|
2025-04-14 21:28:23 -07:00 |
|
Yineng Zhang
|
b62e7e99b8
|
feat: adapt merge_state (#5337)
|
2025-04-12 21:14:04 -07:00 |
|
Yineng Zhang
|
b371f7cd36
|
chore: bump sgl-kernel v0.0.8.post3 (#5332)
|
2025-04-12 12:53:37 -07:00 |
|
PGFLMG
|
4879e50c6d
|
[Feat] Add sparse attn to sgl-kernel (#5327)
|
2025-04-12 11:36:36 -07:00 |
|
Yineng Zhang
|
115ae2e728
|
chore: bump sgl-kernel v0.0.8.post2 (#5317)
|
2025-04-11 23:42:03 -07:00 |
|
Baizhou Zhang
|
e4155e96d0
|
Add flash_attn_varlen_func to sgl-kernel (#5315)
|
2025-04-11 23:36:36 -07:00 |
|
Trevor Morris
|
f65b8d5c89
|
Blackwell Cutlass MLA kernel (#5142)
|
2025-04-11 22:16:51 -07:00 |
|
Yineng Zhang
|
4f288113ce
|
fix: update flash attn (#5308)
|
2025-04-11 16:23:09 -07:00 |
|
Yineng Zhang
|
136b8e6afb
|
fix: remove cublas_grouped_gemm (#5307)
|
2025-04-11 16:22:37 -07:00 |
|
Yineng Zhang
|
c163bf4ff1
|
chore: bump sgl-kernel v0.0.8.post1 (#5289)
|
2025-04-11 02:11:53 -07:00 |
|
Yineng Zhang
|
496dde8491
|
bump sgl-kernel 0.0.8 (#5089)
|
2025-04-05 14:28:04 -07:00 |
|
Yi Zhang
|
bcbbf519f9
|
sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079)
|
2025-04-05 14:23:20 -07:00 |
|
Yineng Zhang
|
d7954b7682
|
bump sgl-kernel v0.0.7 (#5046)
|
2025-04-03 13:38:13 -07:00 |
|
yinfan98
|
b8b6008f47
|
[Fix] fix fa3 build at cu118 (#5036)
|
2025-04-03 11:52:35 -07:00 |
|
Yineng Zhang
|
6384d31776
|
bump sgl-kernel v0.0.6 (#4950)
|
2025-03-31 11:24:09 -07:00 |
|
yinfan98
|
37c66ec856
|
[feat] add fa3 in sgl-kernel (#4902)
Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>
|
2025-03-30 12:57:10 -07:00 |
|
yinfan98
|
0d7fe866f9
|
[Misc] Clean m.def and add Development Tips (#4890)
|
2025-03-29 23:06:18 -07:00 |
|
yinfan98
|
8e7b31546c
|
quick fix: add default for new kernel (#4898)
|
2025-03-29 12:31:59 -07:00 |
|
Qingquan Song
|
45dcfc2e76
|
Add deepseek style fused moe group gate selection kernel (#4530)
|
2025-03-29 11:51:45 -07:00 |
|
Yineng Zhang
|
92941ce7b5
|
bump sgl-kernel 0.0.5.post4 (#4768)
|
2025-03-28 14:40:53 -07:00 |
|
Yineng Zhang
|
31dfff7da7
|
use default for torch.ops (#4835)
|
2025-03-27 19:09:58 -07:00 |
|
Yineng Zhang
|
8bf6d7f406
|
support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-03-27 01:42:28 -07:00 |
|
Trevor Morris
|
e9f8e42318
|
Support FP4 gemm (1/2) (#3899)
|
2025-03-24 19:50:23 -07:00 |
|
Chunan Zeng
|
65c24c28f9
|
[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396)
|
2025-03-23 23:44:17 -07:00 |
|
Yineng Zhang
|
988ab646ec
|
bump v0.0.5.post3 (#4520)
|
2025-03-17 13:05:59 -07:00 |
|
Lianmin Zheng
|
3db35c1af4
|
Release sgl-kernel v0.0.5.post2 (#4469)
|
2025-03-16 01:01:53 -07:00 |
|
Ying Sheng
|
52a34d7448
|
Add greedy verification kernel (#4383)
|
2025-03-16 00:58:26 -07:00 |
|
Yineng Zhang
|
862fe52241
|
bump v0.0.5.post1 (#4437)
|
2025-03-14 15:00:26 -07:00 |
|
Qingquan Song
|
61e4433caf
|
Add moe topk softmax templated from vllm (#4302)
|
2025-03-14 12:03:33 -07:00 |
|
Yineng Zhang
|
2a4cbad8e9
|
bump 0.0.5 sgl-kernel (#4377)
|
2025-03-13 02:08:35 -07:00 |
|
Yineng Zhang
|
6e7239f912
|
release 0.0.4.post3 sgl-kernel (#4331)
|
2025-03-12 01:05:16 -07:00 |
|
Yineng Zhang
|
0a3960f21f
|
fix awq_dequantize (#4333)
|
2025-03-12 01:04:38 -07:00 |
|
Rex
|
07f944631e
|
Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104)
|
2025-03-12 00:10:02 -07:00 |
|
Yineng Zhang
|
cd90945518
|
bump sgl-kernel 0.0.4.post2 (#4288)
|
2025-03-11 00:09:47 -07:00 |
|
Lianmin Zheng
|
cf0ccd406e
|
Optimize rope in sgl kernel (#4267)
|
2025-03-10 10:07:45 -07:00 |
|
Lianmin Zheng
|
1a5023e05d
|
Release sgl-kernel v0.0.4.post1 (#4255)
|
2025-03-10 02:39:50 -07:00 |
|
Yineng Zhang
|
5c7dd14ba1
|
chore: bump v0.0.4 for sgl-kernel (#4223)
|
2025-03-08 23:01:59 -08:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|