Yineng Zhang
|
827aa8730b
|
cleanup sgl-kernel kernels (#3175)
|
2025-01-27 19:11:01 +08:00 |
|
Byron Hsu
|
514f37c32b
|
[kernel] Fix position ids in rope (#3173)
|
2025-01-27 17:09:51 +08:00 |
|
Byron Hsu
|
fb11a43981
|
[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134)
|
2025-01-27 15:28:00 +08:00 |
|
HandH1998
|
82392da830
|
support w8a8 fp8 kernel with CUTLASS (#3047)
Co-authored-by: yych0745 <1398089567@qq.com>
|
2025-01-26 15:46:51 +08:00 |
|
Yineng Zhang
|
95f789adb0
|
minor: cleanup sgl-kernel (#3143)
|
2025-01-26 14:29:58 +08:00 |
|
Xiaoyu Zhang
|
5d9d15e70f
|
support fp32 in sampling_scaling_penalties kernel (#3121)
|
2025-01-25 16:52:17 +08:00 |
|
Yineng Zhang
|
5de4051bcf
|
feat: integrate sampling kernels into sgl-kernel (#3086)
Co-authored-by: Zihao Ye <expye@outlook.com>
|
2025-01-24 01:54:47 +08:00 |
|
Xiaoyu Zhang
|
e0cd65c2b6
|
[hotfix] fix test_sampling_scaling_penalties.py ci test (#3084)
|
2025-01-24 00:33:59 +08:00 |
|
Xiaoyu Zhang
|
f1b6861828
|
use flashinfer vec_dtypes in sgl_kernel (#3083)
|
2025-01-23 22:19:04 +08:00 |
|
Yineng Zhang
|
0da0989ad4
|
sync flashinfer and update sgl-kernel tests (#3081)
|
2025-01-23 21:13:55 +08:00 |
|
Xiaoyu Zhang
|
ac2dc35d0e
|
support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030)
|
2025-01-23 15:29:20 +08:00 |
|
Yineng Zhang
|
bf669606eb
|
feat: integrate bmm_fp8 kernel into sgl-kernel (#3056)
|
2025-01-23 00:39:38 +08:00 |
|
Yineng Zhang
|
9d9b482a39
|
feat: integrate activation kernels into sgl-kernel (#3053)
|
2025-01-22 23:25:45 +08:00 |
|
Yineng Zhang
|
7353fb9b97
|
feat: integrate norm kernels into sgl-kernel (#3052)
|
2025-01-22 21:32:48 +08:00 |
|
Ke Bao
|
0ac019f171
|
Support sm90 Int8 gemm (#3035)
|
2025-01-21 22:21:54 +08:00 |
|
Yineng Zhang
|
5a0d680a14
|
feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033)
|
2025-01-21 20:44:49 +08:00 |
|
Byron Hsu
|
b5caa22dfb
|
[kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-20 20:58:51 +08:00 |
|
yizhang2077
|
6cb3974e77
|
optimize custom allreduce kernel (#2904)
|
2025-01-16 03:04:25 +08:00 |
|
Xiaoyu Zhang
|
d08c77c434
|
Sampling penalties memory interface (#2870)
|
2025-01-13 23:09:00 +08:00 |
|
Xiaoyu Zhang
|
e2b16c4716
|
add sampling_scaling_penalties kernel (#2846)
|
2025-01-12 19:38:17 -08:00 |
|
Ke Bao
|
58f9060efe
|
Update int8 gemm config (#2774)
|
2025-01-07 19:47:37 +08:00 |
|
Ke Bao
|
0f3eb1d294
|
Support cutlass Int8 gemm (#2752)
|
2025-01-06 22:51:22 +08:00 |
|
yizhang2077
|
3900a94afe
|
Support twoshot kernel (#2688)
|
2025-01-06 00:47:16 +08:00 |
|
Xiaoyu Zhang
|
ded9fcd09a
|
improve moe_align_kernel for deepseek v3 (#2735)
|
2025-01-06 00:28:22 +08:00 |
|
HandH1998
|
77d1210b36
|
fix moe_align_block_size (#2615)
|
2024-12-27 23:32:53 +08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Yineng Zhang
|
31548116a8
|
fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2024-12-26 05:31:04 +08:00 |
|
Yineng Zhang
|
e8dbdf75bc
|
fix typo (#2487)
|
2024-12-15 13:44:55 +08:00 |
|
yizhang2077
|
e04d3f2897
|
adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-12-15 13:15:59 +08:00 |
|
Yineng Zhang
|
5c91a315d7
|
feat: support sgl-kernel pypi (#2302)
|
2024-12-01 20:11:21 +08:00 |
|