Commit Graph

49 Commits

Author SHA1 Message Date
Yineng Zhang
496dde8491 bump sgl-kernel 0.0.8 (#5089) 2025-04-05 14:28:04 -07:00
Yineng Zhang
d7954b7682 bump sgl-kernel v0.0.7 (#5046) 2025-04-03 13:38:13 -07:00
Yineng Zhang
6384d31776 bump sgl-kernel v0.0.6 (#4950) 2025-03-31 11:24:09 -07:00
Yineng Zhang
92941ce7b5 bump sgl-kernel 0.0.5.post4 (#4768) 2025-03-28 14:40:53 -07:00
Yineng Zhang
8bf6d7f406 support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-27 01:42:28 -07:00
Yineng Zhang
988ab646ec bump v0.0.5.post3 (#4520) 2025-03-17 13:05:59 -07:00
Lianmin Zheng
3db35c1af4 Release sgl-kernel v0.0.5.post2 (#4469) 2025-03-16 01:01:53 -07:00
Yineng Zhang
862fe52241 bump v0.0.5.post1 (#4437) 2025-03-14 15:00:26 -07:00
Yineng Zhang
4ff1264201 Update pyproject.toml 2025-03-13 02:16:51 -07:00
Yineng Zhang
2a4cbad8e9 bump 0.0.5 sgl-kernel (#4377) 2025-03-13 02:08:35 -07:00
Yineng Zhang
6e7239f912 release 0.0.4.post3 sgl-kernel (#4331) 2025-03-12 01:05:16 -07:00
Yineng Zhang
cd90945518 bump sgl-kernel 0.0.4.post2 (#4288) 2025-03-11 00:09:47 -07:00
Lianmin Zheng
1a5023e05d Release sgl-kernel v0.0.4.post1 (#4255) 2025-03-10 02:39:50 -07:00
laixin
c553e1604c DeepGemm integrate to sgl-kernel (#4165)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-03-10 00:35:07 -07:00
Lianmin Zheng
eb06dbcbf8 Move rope and bmm into sgl-kernel (#4241) 2025-03-09 18:38:15 -07:00
Yineng Zhang
5c7dd14ba1 chore: bump v0.0.4 for sgl-kernel (#4223) 2025-03-08 23:01:59 -08:00
Lianmin Zheng
8abf74e3c9 Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-03-08 22:54:51 -08:00
Yineng Zhang
96263f275c chore: bump v0.0.3.post7 for sgl-kernel (#4176) 2025-03-07 01:15:34 -08:00
Chayenne
18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) 2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team
1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) 2025-02-27 19:42:48 -08:00
Yineng Zhang
e082142519 chore: bump 0.0.3.post6 sgl-kernel (#3555) 2025-02-14 08:55:15 +08:00
Yineng Zhang
4430c0a513 chore: bump 0.0.3.post5 sgl-kernel (#3530) 2025-02-13 01:51:46 +08:00
Yineng Zhang
b96e92e6e6 chore: bump 0.0.3.post4 sgl-kernel (#3523) 2025-02-12 17:28:36 +08:00
Yineng Zhang
6239d0b2e7 chore: bump sgl-kernel v0.0.3.post3 (#3440) 2025-02-10 04:00:52 +08:00
Yineng Zhang
f9905d59a8 support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-07 20:29:51 +08:00
Yineng Zhang
c38b5fb4f4 update 3rdparty and rms norm for sgl-kernel (#3213) 2025-01-30 19:32:21 +08:00
Yineng Zhang
8a96f74988 chore: bump 0.0.3 for sgl-kernel (#3178)
Co-authored-by: ispobock <ispobaoke@hotmail.com>
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2025-01-27 20:29:28 +08:00
Byron Hsu
514f37c32b [kernel] Fix position ids in rope (#3173) 2025-01-27 17:09:51 +08:00
Byron Hsu
741fccd7bf Bump sgl kernel to 0.0.2.post19 (#3167) 2025-01-27 15:36:07 +08:00
Yineng Zhang
318260c0fa chore: bump 0.0.2.post18 for sgl-kernel (#3149) 2025-01-26 19:00:34 +08:00
Yineng Zhang
896c07441e update installation doc for sgl-kernel (#3129) 2025-01-26 00:00:13 +08:00
Yineng Zhang
14e754a868 chore: bump v0.0.2.post17 for sgl-kernel (#3125) 2025-01-25 20:43:02 +08:00
Yineng Zhang
54bac8af0b chore: bump sgl-kernel 0.0.2.post16 (#3087) 2025-01-24 01:57:48 +08:00
Yineng Zhang
1f6cf0d4b9 fix build error for sgl-kernel (#3078) 2025-01-23 19:16:35 +08:00
Lianmin Zheng
553f5a3ffe Remove torch dependency in sgl-kernel (#3074) 2025-01-23 17:23:37 +08:00
Byron Hsu
b5caa22dfb [kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-01-20 20:58:51 +08:00
Yineng Zhang
a53454c55e fix: sgl-kernel link cuda (#2906) 2025-01-16 04:53:23 +08:00
yizhang2077
6cb3974e77 optimize custom allreduce kernel (#2904) 2025-01-16 03:04:25 +08:00
Xiaoyu Zhang
e2b16c4716 add sampling_scaling_penalties kernel (#2846) 2025-01-12 19:38:17 -08:00
yizhang2077
3900a94afe Support twoshot kernel (#2688) 2025-01-06 00:47:16 +08:00
HandH1998
77d1210b36 fix moe_align_block_size (#2615) 2024-12-27 23:32:53 +08:00
Yineng Zhang
2dccecf432 fix: only enable moe_align_block_size for now (#2590) 2024-12-26 16:56:59 +08:00
Yineng Zhang
d7c0e872b0 chore: bump 0.0.2.post8 for sgl-kernel (#2580) 2024-12-26 06:11:39 +08:00
Yineng Zhang
31548116a8 fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
2024-12-26 05:31:04 +08:00
yizhang2077
e04d3f2897 adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-15 13:15:59 +08:00
Yineng Zhang
2673fa29d4 fix: set runtime path (#2466) 2024-12-12 18:05:48 +08:00
Yineng Zhang
56fcd8e8a5 feat: support sgl-kernel PyPI (#2433)
Co-authored-by: Zhangyi <1109276519@qq.com>
2024-12-11 06:06:19 +08:00
Yineng Zhang
47eb139f81 feat: use warp reduce as a simple example (#2304) 2024-12-01 22:43:50 +08:00
Yineng Zhang
5c91a315d7 feat: support sgl-kernel pypi (#2302) 2024-12-01 20:11:21 +08:00