Commit Graph

57 Commits

Author SHA1 Message Date
Stefan He
63ee26d162 Add sgl_per_token_quant_fp8 (#4089) 2025-03-06 20:53:05 -08:00
Xiaoyu Zhang
ad55f17182 [quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786) 2025-03-06 18:05:43 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
Liu Jinjie
926f8efc0c remove unused max_jobs (#3607)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-04 04:23:39 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
6b45a21d16 Reorganize c++ source files in sgl-kernel with multiple folders (#4025) 2025-03-03 05:32:30 -08:00
Baizhou Zhang
67fc595bb8 [Feature] Apply Cublas Grouped Gemm kernel (#3629) 2025-02-18 15:18:31 +08:00
yizhang2077
640363ad20 support blockwise fp8 matmul kernel (#3267) 2025-02-13 01:49:33 +08:00
Xiaoyu Zhang
bb418ced80 optimize per token group quant fp8 (#3490) 2025-02-11 22:19:05 +08:00
Yineng Zhang
29daf498cd fix cu118 link issue (#3421) 2025-02-09 18:16:44 +08:00
Yineng Zhang
f9905d59a8 support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-07 20:29:51 +08:00
Yineng Zhang
00fa7d0417 add copyright for sgl-kernel (#3270) 2025-02-03 21:34:44 +08:00
Yineng Zhang
3ee62235c6 revert the MoE dependence (#3230) 2025-01-31 16:51:41 +08:00
Yineng Zhang
222ce6f1da add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
2025-01-30 23:04:41 +08:00
Yineng Zhang
468d23cff9 update setup for sgl-kernel (#3214) 2025-01-30 19:47:50 +08:00
Yineng Zhang
827aa8730b cleanup sgl-kernel kernels (#3175) 2025-01-27 19:11:01 +08:00
Lianmin Zheng
53cef81587 Improve weight loading and code style (#3174) 2025-01-27 03:00:41 -08:00
Byron Hsu
fb11a43981 [kernel] Integrate flashinfer's rope with higher precision and better perf (#3134) 2025-01-27 15:28:00 +08:00
Yineng Zhang
f265d15b96 use self-hosted to build sgl-kernel (#3154) 2025-01-26 23:02:57 +08:00
Yineng Zhang
02431b9ad2 fix link in README (#3153) 2025-01-26 21:30:00 +08:00
HandH1998
82392da830 support w8a8 fp8 kernel with CUTLASS (#3047)
Co-authored-by: yych0745 <1398089567@qq.com>
2025-01-26 15:46:51 +08:00
Yineng Zhang
95f789adb0 minor: cleanup sgl-kernel (#3143) 2025-01-26 14:29:58 +08:00
yinfan98
9286740eff feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130)
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: yinfan98 <1106110035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-01-26 02:55:08 +08:00
Yineng Zhang
04f0b4cbef minor: update sgl-kernel setup (#3107) 2025-01-24 20:10:35 +08:00
Trevor Morris
685a5738a7 Allow local cutlass directory to be used in sgl-kernel build (#3037) 2025-01-24 03:59:47 -08:00
Ke Bao
6619f48e18 Fix cu118 group gemm compile issue (#3097) 2025-01-24 15:19:09 +08:00
Yineng Zhang
5de4051bcf feat: integrate sampling kernels into sgl-kernel (#3086)
Co-authored-by: Zihao Ye <expye@outlook.com>
2025-01-24 01:54:47 +08:00
Yineng Zhang
07a22cbba3 use env variable to control the build conf on the CPU build node (#3080) 2025-01-23 20:46:49 +08:00
Yineng Zhang
3d0bfa3e17 update version setup for sgl-kernel (#3079) 2025-01-23 19:45:25 +08:00
Lianmin Zheng
553f5a3ffe Remove torch dependency in sgl-kernel (#3074) 2025-01-23 17:23:37 +08:00
Xiaoyu Zhang
ac2dc35d0e support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030) 2025-01-23 15:29:20 +08:00
Yineng Zhang
bf669606eb feat: integrate bmm_fp8 kernel into sgl-kernel (#3056) 2025-01-23 00:39:38 +08:00
Yineng Zhang
bcda0c9ee6 sync the upstream updates of flashinfer (#3051) 2025-01-22 20:33:13 +08:00
Yineng Zhang
9f8f2c7f74 update norm cu (#3048) 2025-01-22 18:58:44 +08:00
Ke Bao
6fc37bd8ee Fix sgl-kernel compile for sm80 (#3046) 2025-01-22 16:49:08 +08:00
Yineng Zhang
5a0d680a14 feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033) 2025-01-21 20:44:49 +08:00
Ke Bao
5dfcacfcb1 Add compile flags for cutlass 3.x (#3013)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-01-21 00:04:12 +08:00
Byron Hsu
b5caa22dfb [kernel] port rope cuda kernel to sgl-kernel (#2993)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-01-20 20:58:51 +08:00
lukec
6f98c586bd fix sgl-kernel setup.py (#2963) 2025-01-18 18:50:37 +08:00
Yineng Zhang
2dc957d421 fix setup for sgl kernel (#2917) 2025-01-16 18:17:34 +08:00
Yineng Zhang
a53454c55e fix: sgl-kernel link cuda (#2906) 2025-01-16 04:53:23 +08:00
yizhang2077
6cb3974e77 optimize custom allreduce kernel (#2904) 2025-01-16 03:04:25 +08:00
Xiaoyu Zhang
e2b16c4716 add sampling_scaling_penalties kernel (#2846) 2025-01-12 19:38:17 -08:00
Ke Bao
0f3eb1d294 Support cutlass Int8 gemm (#2752) 2025-01-06 22:51:22 +08:00
Yineng Zhang
b6b57fc200 minor: cleanup sgl-kernel (#2679) 2024-12-31 14:52:00 +08:00
Ke Bao
b4403985d0 Add cutlass submodule for sgl-kernel (#2676) 2024-12-31 14:28:29 +08:00
Ke Bao
b02da24a5b Refactor sgl-kernel build (#2642) 2024-12-30 18:07:01 +08:00
Yineng Zhang
31548116a8 fix moe_align_block_size_kernel for shared memory issue (#2579)
Co-authored-by: ispobock <ispobaoke@163.com>
2024-12-26 05:31:04 +08:00
yizhang2077
e04d3f2897 adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-15 13:15:59 +08:00
Yineng Zhang
2673fa29d4 fix: set runtime path (#2466) 2024-12-12 18:05:48 +08:00