Byron Hsu
|
b5caa22dfb
|
[kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-20 20:58:51 +08:00 |
|
yizhang2077
|
6cb3974e77
|
optimize custom allreduce kernel (#2904)
|
2025-01-16 03:04:25 +08:00 |
|
Xiaoyu Zhang
|
d08c77c434
|
Sampling penalties memory interface (#2870)
|
2025-01-13 23:09:00 +08:00 |
|
Xiaoyu Zhang
|
e2b16c4716
|
add sampling_scaling_penalties kernel (#2846)
|
2025-01-12 19:38:17 -08:00 |
|
Ke Bao
|
58f9060efe
|
Update int8 gemm config (#2774)
|
2025-01-07 19:47:37 +08:00 |
|
Ke Bao
|
0f3eb1d294
|
Support cutlass Int8 gemm (#2752)
|
2025-01-06 22:51:22 +08:00 |
|
yizhang2077
|
3900a94afe
|
Support twoshot kernel (#2688)
|
2025-01-06 00:47:16 +08:00 |
|
Xiaoyu Zhang
|
ded9fcd09a
|
improve moe_align_kernel for deepseek v3 (#2735)
|
2025-01-06 00:28:22 +08:00 |
|
HandH1998
|
77d1210b36
|
fix moe_align_block_size (#2615)
|
2024-12-27 23:32:53 +08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Yineng Zhang
|
31548116a8
|
fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2024-12-26 05:31:04 +08:00 |
|
Yineng Zhang
|
e8dbdf75bc
|
fix typo (#2487)
|
2024-12-15 13:44:55 +08:00 |
|
yizhang2077
|
e04d3f2897
|
adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-12-15 13:15:59 +08:00 |
|
Yineng Zhang
|
5c91a315d7
|
feat: support sgl-kernel pypi (#2302)
|
2024-12-01 20:11:21 +08:00 |
|