Yineng Zhang
|
6dea5c96bf
|
Revert "get the python version from env (#4729)" (#4863)
|
2025-03-28 08:07:48 -07:00 |
|
DavidChan
|
5eae67cb1f
|
get the python version from env (#4729)
|
2025-03-27 22:26:42 -07:00 |
|
Trevor Morris
|
e9f8e42318
|
Support FP4 gemm (1/2) (#3899)
|
2025-03-24 19:50:23 -07:00 |
|
Chunan Zeng
|
65c24c28f9
|
[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396)
|
2025-03-23 23:44:17 -07:00 |
|
Yineng Zhang
|
9971dc2283
|
Revert "feat: Add FlashMLA submodule (#4449)" (#4470)
|
2025-03-16 01:30:05 -07:00 |
|
Ying Sheng
|
52a34d7448
|
Add greedy verification kernel (#4383)
|
2025-03-16 00:58:26 -07:00 |
|
Shi Shuai
|
81f431eded
|
feat: Add FlashMLA submodule (#4449)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-03-15 23:30:25 -07:00 |
|
Qingquan Song
|
61e4433caf
|
Add moe topk softmax templated from vllm (#4302)
|
2025-03-14 12:03:33 -07:00 |
|
Yineng Zhang
|
2937387a50
|
fix accuracy issue (#4376)
|
2025-03-13 02:06:22 -07:00 |
|
Elfie Guo
|
7c86671131
|
Support Blackwell Block Scale FP8 Gemm (#4278)
|
2025-03-12 14:17:11 -07:00 |
|
Rex
|
07f944631e
|
Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104)
|
2025-03-12 00:10:02 -07:00 |
|
laixin
|
c553e1604c
|
DeepGemm integrate to sgl-kernel (#4165)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-03-10 00:35:07 -07:00 |
|
Lianmin Zheng
|
7c0541b385
|
Move activation.cu to sgl-kernel/elementwise (#4250)
|
2025-03-09 22:41:13 -07:00 |
|
Lianmin Zheng
|
730d084f2a
|
Minor style fix for sgl-kernel (#4243)
|
2025-03-09 20:15:13 -07:00 |
|
Lianmin Zheng
|
eb06dbcbf8
|
Move rope and bmm into sgl-kernel (#4241)
|
2025-03-09 18:38:15 -07:00 |
|
Yineng Zhang
|
df84ab2a5b
|
update sgl-kernel 3rdparty (#4228)
|
2025-03-09 01:16:05 -08:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|
Stefan He
|
63ee26d162
|
Add sgl_per_token_quant_fp8 (#4089)
|
2025-03-06 20:53:05 -08:00 |
|
Xiaoyu Zhang
|
ad55f17182
|
[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786)
|
2025-03-06 18:05:43 -08:00 |
|
Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Liu Jinjie
|
926f8efc0c
|
remove unused max_jobs (#3607)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
|
2025-03-04 04:23:39 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
6b45a21d16
|
Reorganize c++ source files in sgl-kernel with multiple folders (#4025)
|
2025-03-03 05:32:30 -08:00 |
|
Baizhou Zhang
|
67fc595bb8
|
[Feature] Apply Cublas Grouped Gemm kernel (#3629)
|
2025-02-18 15:18:31 +08:00 |
|
yizhang2077
|
640363ad20
|
support blockwise fp8 matmul kernel (#3267)
|
2025-02-13 01:49:33 +08:00 |
|
Xiaoyu Zhang
|
bb418ced80
|
optimize per token group quant fp8 (#3490)
|
2025-02-11 22:19:05 +08:00 |
|
Yineng Zhang
|
29daf498cd
|
fix cu118 link issue (#3421)
|
2025-02-09 18:16:44 +08:00 |
|
Yineng Zhang
|
f9905d59a8
|
support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-07 20:29:51 +08:00 |
|
Yineng Zhang
|
00fa7d0417
|
add copyright for sgl-kernel (#3270)
|
2025-02-03 21:34:44 +08:00 |
|
Yineng Zhang
|
3ee62235c6
|
revert the MoE dependence (#3230)
|
2025-01-31 16:51:41 +08:00 |
|
Yineng Zhang
|
222ce6f1da
|
add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
|
2025-01-30 23:04:41 +08:00 |
|
Yineng Zhang
|
468d23cff9
|
update setup for sgl-kernel (#3214)
|
2025-01-30 19:47:50 +08:00 |
|
Yineng Zhang
|
827aa8730b
|
cleanup sgl-kernel kernels (#3175)
|
2025-01-27 19:11:01 +08:00 |
|
Lianmin Zheng
|
53cef81587
|
Improve weight loading and code style (#3174)
|
2025-01-27 03:00:41 -08:00 |
|
Byron Hsu
|
fb11a43981
|
[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134)
|
2025-01-27 15:28:00 +08:00 |
|
Yineng Zhang
|
f265d15b96
|
use self-hosted to build sgl-kernel (#3154)
|
2025-01-26 23:02:57 +08:00 |
|
Yineng Zhang
|
02431b9ad2
|
fix link in README (#3153)
|
2025-01-26 21:30:00 +08:00 |
|
HandH1998
|
82392da830
|
support w8a8 fp8 kernel with CUTLASS (#3047)
Co-authored-by: yych0745 <1398089567@qq.com>
|
2025-01-26 15:46:51 +08:00 |
|
Yineng Zhang
|
95f789adb0
|
minor: cleanup sgl-kernel (#3143)
|
2025-01-26 14:29:58 +08:00 |
|
yinfan98
|
9286740eff
|
feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130)
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: yinfan98 <1106110035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-26 02:55:08 +08:00 |
|
Yineng Zhang
|
04f0b4cbef
|
minor: update sgl-kernel setup (#3107)
|
2025-01-24 20:10:35 +08:00 |
|
Trevor Morris
|
685a5738a7
|
Allow local cutlass directory to be used in sgl-kernel build (#3037)
|
2025-01-24 03:59:47 -08:00 |
|
Ke Bao
|
6619f48e18
|
Fix cu118 group gemm compile issue (#3097)
|
2025-01-24 15:19:09 +08:00 |
|
Yineng Zhang
|
5de4051bcf
|
feat: integrate sampling kernels into sgl-kernel (#3086)
Co-authored-by: Zihao Ye <expye@outlook.com>
|
2025-01-24 01:54:47 +08:00 |
|
Yineng Zhang
|
07a22cbba3
|
use env variable to control the build conf on the CPU build node (#3080)
|
2025-01-23 20:46:49 +08:00 |
|
Yineng Zhang
|
3d0bfa3e17
|
update version setup for sgl-kernel (#3079)
|
2025-01-23 19:45:25 +08:00 |
|
Lianmin Zheng
|
553f5a3ffe
|
Remove torch dependency in sgl-kernel (#3074)
|
2025-01-23 17:23:37 +08:00 |
|
Xiaoyu Zhang
|
ac2dc35d0e
|
support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030)
|
2025-01-23 15:29:20 +08:00 |
|
Yineng Zhang
|
bf669606eb
|
feat: integrate bmm_fp8 kernel into sgl-kernel (#3056)
|
2025-01-23 00:39:38 +08:00 |
|
Yineng Zhang
|
bcda0c9ee6
|
sync the upstream updates of flashinfer (#3051)
|
2025-01-22 20:33:13 +08:00 |
|