Xiaoyu Zhang
|
55a7ec388f
|
use warp shuffle style reduce and flashinfer vectorize (#3628)
|
2025-02-19 20:53:51 +08:00 |
|
Baizhou Zhang
|
67fc595bb8
|
[Feature] Apply Cublas Grouped Gemm kernel (#3629)
|
2025-02-18 15:18:31 +08:00 |
|
Xiaoyu Zhang
|
3efbdf68b9
|
fix sgl-kernel codestyle (#3563)
|
2025-02-14 18:05:52 +08:00 |
|
Yineng Zhang
|
e082142519
|
chore: bump 0.0.3.post6 sgl-kernel (#3555)
|
2025-02-14 08:55:15 +08:00 |
|
Xiaoyu Zhang
|
f076328bb7
|
fix moe_align_kernel shm init not sync bug (#3534)
|
2025-02-13 16:47:00 +08:00 |
|
Yineng Zhang
|
4430c0a513
|
chore: bump 0.0.3.post5 sgl-kernel (#3530)
|
2025-02-13 01:51:46 +08:00 |
|
yizhang2077
|
640363ad20
|
support blockwise fp8 matmul kernel (#3267)
|
2025-02-13 01:49:33 +08:00 |
|
Yineng Zhang
|
b96e92e6e6
|
chore: bump 0.0.3.post4 sgl-kernel (#3523)
|
2025-02-12 17:28:36 +08:00 |
|
Xiaoyu Zhang
|
bb418ced80
|
optimize per token group quant fp8 (#3490)
|
2025-02-11 22:19:05 +08:00 |
|
Yineng Zhang
|
6239d0b2e7
|
chore: bump sgl-kernel v0.0.3.post3 (#3440)
|
2025-02-10 04:00:52 +08:00 |
|
Yineng Zhang
|
4cfd3add6d
|
support version in sgl-kernel (#3439)
|
2025-02-10 03:49:52 +08:00 |
|
Yineng Zhang
|
29daf498cd
|
fix cu118 link issue (#3421)
|
2025-02-09 18:16:44 +08:00 |
|
Yineng Zhang
|
f9905d59a8
|
support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-07 20:29:51 +08:00 |
|
Yineng Zhang
|
45c87e083f
|
fix undefined symbol cudaGetDriverEntryPointByVersion (#3372)
|
2025-02-07 19:32:45 +08:00 |
|
Xiaoyu Zhang
|
cdae77b03d
|
optimize moe_align_kernel cuda (#3347)
|
2025-02-07 00:53:46 +08:00 |
|
Yineng Zhang
|
adeee15204
|
fix sgl-kernel build failure on AMD (#3352)
|
2025-02-07 00:35:59 +08:00 |
|
Xiaoyu Zhang
|
ad3499858e
|
clean moe align block kernel code and add acc test (#3332)
|
2025-02-06 16:42:36 +08:00 |
|
HAI
|
2c1a695ff1
|
ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287)
|
2025-02-04 21:44:44 +08:00 |
|
Yineng Zhang
|
00fa7d0417
|
add copyright for sgl-kernel (#3270)
|
2025-02-03 21:34:44 +08:00 |
|
Yineng Zhang
|
7876279ea7
|
update cutlass dependency (#3240)
|
2025-02-01 03:13:44 +08:00 |
|
Yineng Zhang
|
3ee62235c6
|
revert the MoE dependence (#3230)
|
2025-01-31 16:51:41 +08:00 |
|
Yineng Zhang
|
9602c2aac7
|
keep the parts needed for moe_kernels (#3218)
|
2025-01-31 00:39:47 +08:00 |
|
Yineng Zhang
|
e81d7f11de
|
add tensorrt_llm moe_gemm as 3rdparty (#3217)
|
2025-01-30 23:49:14 +08:00 |
|
Yineng Zhang
|
222ce6f1da
|
add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
|
2025-01-30 23:04:41 +08:00 |
|
Yineng Zhang
|
468d23cff9
|
update setup for sgl-kernel (#3214)
|
2025-01-30 19:47:50 +08:00 |
|
Yineng Zhang
|
c38b5fb4f4
|
update 3rdparty and rms norm for sgl-kernel (#3213)
|
2025-01-30 19:32:21 +08:00 |
|
Xiaoyu Zhang
|
81262c7b72
|
clean up useless file (#3192)
|
2025-01-28 14:29:30 +08:00 |
|
Yineng Zhang
|
8a96f74988
|
chore: bump 0.0.3 for sgl-kernel (#3178)
Co-authored-by: ispobock <ispobaoke@hotmail.com>
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2025-01-27 20:29:28 +08:00 |
|
Yineng Zhang
|
827aa8730b
|
cleanup sgl-kernel kernels (#3175)
|
2025-01-27 19:11:01 +08:00 |
|
Lianmin Zheng
|
53cef81587
|
Improve weight loading and code style (#3174)
|
2025-01-27 03:00:41 -08:00 |
|
Byron Hsu
|
514f37c32b
|
[kernel] Fix position ids in rope (#3173)
|
2025-01-27 17:09:51 +08:00 |
|
Byron Hsu
|
741fccd7bf
|
Bump sgl kernel to 0.0.2.post19 (#3167)
|
2025-01-27 15:36:07 +08:00 |
|
Byron Hsu
|
fb11a43981
|
[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134)
|
2025-01-27 15:28:00 +08:00 |
|
Yineng Zhang
|
f265d15b96
|
use self-hosted to build sgl-kernel (#3154)
|
2025-01-26 23:02:57 +08:00 |
|
Yineng Zhang
|
02431b9ad2
|
fix link in README (#3153)
|
2025-01-26 21:30:00 +08:00 |
|
Yineng Zhang
|
318260c0fa
|
chore: bump 0.0.2.post18 for sgl-kernel (#3149)
|
2025-01-26 19:00:34 +08:00 |
|
HandH1998
|
82392da830
|
support w8a8 fp8 kernel with CUTLASS (#3047)
Co-authored-by: yych0745 <1398089567@qq.com>
|
2025-01-26 15:46:51 +08:00 |
|
Yineng Zhang
|
95f789adb0
|
minor: cleanup sgl-kernel (#3143)
|
2025-01-26 14:29:58 +08:00 |
|
yinfan98
|
9286740eff
|
feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130)
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: yinfan98 <1106110035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-26 02:55:08 +08:00 |
|
Yineng Zhang
|
896c07441e
|
update installation doc for sgl-kernel (#3129)
|
2025-01-26 00:00:13 +08:00 |
|
Yineng Zhang
|
14e754a868
|
chore: bump v0.0.2.post17 for sgl-kernel (#3125)
|
2025-01-25 20:43:02 +08:00 |
|
yizhang2077
|
98522149ff
|
mirror fix for custom allreduce (#3124)
|
2025-01-25 18:26:41 +08:00 |
|
Xiaoyu Zhang
|
5d9d15e70f
|
support fp32 in sampling_scaling_penalties kernel (#3121)
|
2025-01-25 16:52:17 +08:00 |
|
Ke Bao
|
a22f60a313
|
Add workflow for sgl-kernel cu118 release (#3109)
|
2025-01-24 22:30:30 +08:00 |
|
Yineng Zhang
|
04f0b4cbef
|
minor: update sgl-kernel setup (#3107)
|
2025-01-24 20:10:35 +08:00 |
|
Trevor Morris
|
685a5738a7
|
Allow local cutlass directory to be used in sgl-kernel build (#3037)
|
2025-01-24 03:59:47 -08:00 |
|
Yineng Zhang
|
153b414e83
|
minor: sync flashinfer and add turbomind as 3rdparty (#3105)
|
2025-01-24 19:22:39 +08:00 |
|
Ke Bao
|
6619f48e18
|
Fix cu118 group gemm compile issue (#3097)
|
2025-01-24 15:19:09 +08:00 |
|
Ke Bao
|
7bad7e75bf
|
Add shapes for int8 gemm benchmark (#3093)
|
2025-01-24 12:27:30 +08:00 |
|
Yineng Zhang
|
54bac8af0b
|
chore: bump sgl-kernel 0.0.2.post16 (#3087)
|
2025-01-24 01:57:48 +08:00 |
|