Commit Graph

134 Commits

Author SHA1 Message Date
Yineng Zhang
96263f275c chore: bump v0.0.3.post7 for sgl-kernel (#4176) 2025-03-07 01:15:34 -08:00
Yineng Zhang
94a2b9d33e Put utils in ifndef USE_ROCM to fix CI (#4167) (#4168)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-03-07 00:01:17 -08:00
Stefan He
3c3eb374b2 Remove non-existent AMD header include (#4166) 2025-03-06 23:29:30 -08:00
Stefan He
95085d65e9 [Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163) 2025-03-06 22:58:52 -08:00
Stefan He
63ee26d162 Add sgl_per_token_quant_fp8 (#4089) 2025-03-06 20:53:05 -08:00
Xiaoyu Zhang
ad55f17182 [quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786) 2025-03-06 18:05:43 -08:00
Liu Jinjie
0804dd11a0 remove unused max_jobs in setup_rocm.py (#4126)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-06 00:12:19 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
Liu Jinjie
926f8efc0c remove unused max_jobs (#3607)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-04 04:23:39 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
110e006673 Reorganize python source files in sgl-kernel with multiple files (#4027) 2025-03-03 06:36:40 -08:00
Lianmin Zheng
6b45a21d16 Reorganize c++ source files in sgl-kernel with multiple folders (#4025) 2025-03-03 05:32:30 -08:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Hubert Lu
9cf4077294 Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406) 2025-03-02 15:19:06 -08:00
Chayenne
18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) 2025-02-28 23:57:17 -08:00
Elfie Guo
9e74ee91da Update cutlass dependency (#3966) 2025-02-28 16:16:31 -08:00
yiakwy-xpu-ml-framework-team
1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) 2025-02-27 19:42:48 -08:00
Xiaoyu Zhang
55a7ec388f use warp shuffle style reduce and flashinfer vectorize (#3628) 2025-02-19 20:53:51 +08:00
Baizhou Zhang
67fc595bb8 [Feature] Apply Cublas Grouped Gemm kernel (#3629) 2025-02-18 15:18:31 +08:00
Xiaoyu Zhang
3efbdf68b9 fix sgl-kernel codestyle (#3563) 2025-02-14 18:05:52 +08:00
Yineng Zhang
e082142519 chore: bump 0.0.3.post6 sgl-kernel (#3555) 2025-02-14 08:55:15 +08:00
Xiaoyu Zhang
f076328bb7 fix moe_align_kernel shm init not sync bug (#3534) 2025-02-13 16:47:00 +08:00
Yineng Zhang
4430c0a513 chore: bump 0.0.3.post5 sgl-kernel (#3530) 2025-02-13 01:51:46 +08:00
yizhang2077
640363ad20 support blockwise fp8 matmul kernel (#3267) 2025-02-13 01:49:33 +08:00
Yineng Zhang
b96e92e6e6 chore: bump 0.0.3.post4 sgl-kernel (#3523) 2025-02-12 17:28:36 +08:00
Xiaoyu Zhang
bb418ced80 optimize per token group quant fp8 (#3490) 2025-02-11 22:19:05 +08:00
Yineng Zhang
6239d0b2e7 chore: bump sgl-kernel v0.0.3.post3 (#3440) 2025-02-10 04:00:52 +08:00
Yineng Zhang
4cfd3add6d support version in sgl-kernel (#3439) 2025-02-10 03:49:52 +08:00
Yineng Zhang
29daf498cd fix cu118 link issue (#3421) 2025-02-09 18:16:44 +08:00
Yineng Zhang
f9905d59a8 support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-07 20:29:51 +08:00
Yineng Zhang
45c87e083f fix undefined symbol cudaGetDriverEntryPointByVersion (#3372) 2025-02-07 19:32:45 +08:00
Xiaoyu Zhang
cdae77b03d optimize moe_align_kernel cuda (#3347) 2025-02-07 00:53:46 +08:00
Yineng Zhang
adeee15204 fix sgl-kernel build failure on AMD (#3352) 2025-02-07 00:35:59 +08:00
Xiaoyu Zhang
ad3499858e clean moe align block kernel code and add acc test (#3332) 2025-02-06 16:42:36 +08:00
HAI
2c1a695ff1 ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) 2025-02-04 21:44:44 +08:00
Yineng Zhang
00fa7d0417 add copyright for sgl-kernel (#3270) 2025-02-03 21:34:44 +08:00
Yineng Zhang
7876279ea7 update cutlass dependency (#3240) 2025-02-01 03:13:44 +08:00
Yineng Zhang
3ee62235c6 revert the MoE dependence (#3230) 2025-01-31 16:51:41 +08:00
Yineng Zhang
9602c2aac7 keep the parts needed for moe_kernels (#3218) 2025-01-31 00:39:47 +08:00
Yineng Zhang
e81d7f11de add tensorrt_llm moe_gemm as 3rdparty (#3217) 2025-01-30 23:49:14 +08:00
Yineng Zhang
222ce6f1da add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
2025-01-30 23:04:41 +08:00
Yineng Zhang
468d23cff9 update setup for sgl-kernel (#3214) 2025-01-30 19:47:50 +08:00
Yineng Zhang
c38b5fb4f4 update 3rdparty and rms norm for sgl-kernel (#3213) 2025-01-30 19:32:21 +08:00
Xiaoyu Zhang
81262c7b72 clean up useless file (#3192) 2025-01-28 14:29:30 +08:00
Yineng Zhang
8a96f74988 chore: bump 0.0.3 for sgl-kernel (#3178)
Co-authored-by: ispobock <ispobaoke@hotmail.com>
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2025-01-27 20:29:28 +08:00
Yineng Zhang
827aa8730b cleanup sgl-kernel kernels (#3175) 2025-01-27 19:11:01 +08:00
Lianmin Zheng
53cef81587 Improve weight loading and code style (#3174) 2025-01-27 03:00:41 -08:00
Byron Hsu
514f37c32b [kernel] Fix position ids in rope (#3173) 2025-01-27 17:09:51 +08:00
Byron Hsu
741fccd7bf Bump sgl kernel to 0.0.2.post19 (#3167) 2025-01-27 15:36:07 +08:00