Yineng Zhang
|
9f8f2c7f74
|
update norm cu (#3048)
|
2025-01-22 18:58:44 +08:00 |
|
Ke Bao
|
6fc37bd8ee
|
Fix sgl-kernel compile for sm80 (#3046)
|
2025-01-22 16:49:08 +08:00 |
|
Ke Bao
|
0ac019f171
|
Support sm90 Int8 gemm (#3035)
|
2025-01-21 22:21:54 +08:00 |
|
Yineng Zhang
|
5a0d680a14
|
feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033)
|
2025-01-21 20:44:49 +08:00 |
|
Yineng Zhang
|
ec1c21cdc4
|
upgrade torch version for sgl-kernel (#3026)
|
2025-01-21 14:32:08 +08:00 |
|
Yineng Zhang
|
6c856b4f3a
|
minor: update Makefile for sgl-kernel (#3025)
|
2025-01-21 13:08:15 +08:00 |
|
Ke Bao
|
5dfcacfcb1
|
Add compile flags for cutlass 3.x (#3013)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-01-21 00:04:12 +08:00 |
|
Byron Hsu
|
b5caa22dfb
|
[kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-20 20:58:51 +08:00 |
|
Yineng Zhang
|
a69cb5cff7
|
cleanup unused header in sgl_kernel (#2986)
|
2025-01-20 00:44:49 +08:00 |
|
Yineng Zhang
|
d33cbb7e58
|
remove cub and add cccl (#2976)
|
2025-01-19 15:51:27 +08:00 |
|
Yineng Zhang
|
e2cdc8a5b5
|
upgrade cutlass v3.7.0 (#2967)
|
2025-01-18 23:37:42 +08:00 |
|
lukec
|
6f98c586bd
|
fix sgl-kernel setup.py (#2963)
|
2025-01-18 18:50:37 +08:00 |
|
Yineng Zhang
|
7596417732
|
minor: use bear for compilation database (#2919)
|
2025-01-16 18:39:11 +08:00 |
|
Yineng Zhang
|
2dc957d421
|
fix setup for sgl kernel (#2917)
|
2025-01-16 18:17:34 +08:00 |
|
Yineng Zhang
|
b7f3fec13c
|
minor: rename bench for sgl kernel (#2909)
|
2025-01-16 05:55:43 +08:00 |
|
Yineng Zhang
|
a53454c55e
|
fix: sgl-kernel link cuda (#2906)
|
2025-01-16 04:53:23 +08:00 |
|
yizhang2077
|
6cb3974e77
|
optimize custom allreduce kernel (#2904)
|
2025-01-16 03:04:25 +08:00 |
|
Xiaoyu Zhang
|
f005758f2b
|
introduce CUB in sgl-kernel (#2887)
|
2025-01-14 19:48:59 +08:00 |
|
Xiaoyu Zhang
|
d08c77c434
|
Sampling penalties memory interface (#2870)
|
2025-01-13 23:09:00 +08:00 |
|
Xiaoyu Zhang
|
e2b16c4716
|
add sampling_scaling_penalties kernel (#2846)
|
2025-01-12 19:38:17 -08:00 |
|
Ke Bao
|
58f9060efe
|
Update int8 gemm config (#2774)
|
2025-01-07 19:47:37 +08:00 |
|
Ke Bao
|
0f3eb1d294
|
Support cutlass Int8 gemm (#2752)
|
2025-01-06 22:51:22 +08:00 |
|
Ke Bao
|
06dd2eab84
|
Remove unused var in moe_align_kernel (#2751)
|
2025-01-06 22:13:28 +08:00 |
|
Ke Bao
|
439f65809f
|
Fix sgl-kernel cu118 compile issue (#2750)
|
2025-01-06 21:59:31 +08:00 |
|
yizhang2077
|
3900a94afe
|
Support twoshot kernel (#2688)
|
2025-01-06 00:47:16 +08:00 |
|
Xiaoyu Zhang
|
ded9fcd09a
|
improve moe_align_kernel for deepseek v3 (#2735)
|
2025-01-06 00:28:22 +08:00 |
|
Yineng Zhang
|
b6b57fc200
|
minor: cleanup sgl-kernel (#2679)
|
2024-12-31 14:52:00 +08:00 |
|
Ke Bao
|
b4403985d0
|
Add cutlass submodule for sgl-kernel (#2676)
|
2024-12-31 14:28:29 +08:00 |
|
Ke Bao
|
b02da24a5b
|
Refactor sgl-kernel build (#2642)
|
2024-12-30 18:07:01 +08:00 |
|
HandH1998
|
77d1210b36
|
fix moe_align_block_size (#2615)
|
2024-12-27 23:32:53 +08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Yineng Zhang
|
2dccecf432
|
fix: only enable moe_align_block_size for now (#2590)
|
2024-12-26 16:56:59 +08:00 |
|
Yineng Zhang
|
d7c0e872b0
|
chore: bump 0.0.2.post8 for sgl-kernel (#2580)
|
2024-12-26 06:11:39 +08:00 |
|
Yineng Zhang
|
31548116a8
|
fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2024-12-26 05:31:04 +08:00 |
|
Yineng Zhang
|
e8dbdf75bc
|
fix typo (#2487)
|
2024-12-15 13:44:55 +08:00 |
|
yizhang2077
|
e04d3f2897
|
adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-12-15 13:15:59 +08:00 |
|
Yineng Zhang
|
fccbfa3752
|
format: add clang-format for sgl-kernel (#2483)
|
2024-12-14 22:36:04 +08:00 |
|
Yineng Zhang
|
2673fa29d4
|
fix: set runtime path (#2466)
|
2024-12-12 18:05:48 +08:00 |
|
Yineng Zhang
|
dedaf8cd48
|
minor: update pypi tag (#2463)
|
2024-12-12 15:21:45 +08:00 |
|
Yineng Zhang
|
32ed016041
|
chore: bump v0.0.2 for sgl-kernel (#2462)
|
2024-12-12 14:58:05 +08:00 |
|
Yineng Zhang
|
7310aede97
|
fix: compatible with PEP 440 (#2435)
|
2024-12-11 06:48:45 +08:00 |
|
Yineng Zhang
|
5de9a58eca
|
fix: use manylinux2014_x86_64 tag (#2434)
|
2024-12-11 06:17:41 +08:00 |
|
Yineng Zhang
|
56fcd8e8a5
|
feat: support sgl-kernel PyPI (#2433)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2024-12-11 06:06:19 +08:00 |
|
Yineng Zhang
|
28bc60dcab
|
misc: update build setup (#2306)
|
2024-12-02 02:03:49 +08:00 |
|
Yineng Zhang
|
7301a39b13
|
fix: resolve CodeQL cpp issue (#2305)
|
2024-12-01 23:55:19 +08:00 |
|
Yineng Zhang
|
47eb139f81
|
feat: use warp reduce as a simple example (#2304)
|
2024-12-01 22:43:50 +08:00 |
|
Yineng Zhang
|
5c91a315d7
|
feat: support sgl-kernel pypi (#2302)
|
2024-12-01 20:11:21 +08:00 |
|
Lianmin Zheng
|
b53d6cbda3
|
Add new contributors so they can trigger CI automatically (#2269)
Co-authored-by: Qun Yang <qun.yang@intel.com>
Co-authored-by: zhengy001 <zhengy.gator@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: xiaobo <xiaob.chen@outlook.com>
|
2024-11-29 16:37:52 -08:00 |
|
Yineng Zhang
|
419a57e771
|
minor: add sgl-kernel dir (#2261)
|
2024-11-30 02:27:35 +08:00 |
|