Stefan He
|
e0917e6bd0
|
Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
|
2025-03-12 00:08:03 -07:00 |
|
Xiaoyu Zhang
|
7130a7cea9
|
refine sgl_moe_align_block_size_benchmark (#4327)
|
2025-03-11 22:48:38 -07:00 |
|
yigex
|
690e1f2371
|
[AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
|
2025-03-11 10:35:28 -07:00 |
|
Yineng Zhang
|
cd90945518
|
bump sgl-kernel 0.0.4.post2 (#4288)
|
2025-03-11 00:09:47 -07:00 |
|
Yineng Zhang
|
bde24ab31f
|
update deepgemm (#4284)
|
2025-03-10 23:39:57 -07:00 |
|
Elfie Guo
|
bf2eefc0c7
|
Uupdate cutalss dependency for its bug fix (#4277)
|
2025-03-10 17:00:05 -07:00 |
|
Yineng Zhang
|
3dd4feae63
|
add THIRDPARTYNOTICES for DeepGEMM (#4272)
|
2025-03-10 11:10:57 -07:00 |
|
Lianmin Zheng
|
cf0ccd406e
|
Optimize rope in sgl kernel (#4267)
|
2025-03-10 10:07:45 -07:00 |
|
Lianmin Zheng
|
1a5023e05d
|
Release sgl-kernel v0.0.4.post1 (#4255)
|
2025-03-10 02:39:50 -07:00 |
|
Xiaoyu Zhang
|
23308a9032
|
fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231)
|
2025-03-10 01:42:58 -07:00 |
|
Lianmin Zheng
|
aa957102a9
|
Simplify tests & Fix trtllm custom allreduce registration (#4252)
|
2025-03-10 01:24:22 -07:00 |
|
laixin
|
c553e1604c
|
DeepGemm integrate to sgl-kernel (#4165)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-03-10 00:35:07 -07:00 |
|
Lianmin Zheng
|
7c0541b385
|
Move activation.cu to sgl-kernel/elementwise (#4250)
|
2025-03-09 22:41:13 -07:00 |
|
Lianmin Zheng
|
730d084f2a
|
Minor style fix for sgl-kernel (#4243)
|
2025-03-09 20:15:13 -07:00 |
|
Lianmin Zheng
|
eb06dbcbf8
|
Move rope and bmm into sgl-kernel (#4241)
|
2025-03-09 18:38:15 -07:00 |
|
Yineng Zhang
|
df84ab2a5b
|
update sgl-kernel 3rdparty (#4228)
|
2025-03-09 01:16:05 -08:00 |
|
Yineng Zhang
|
5c7dd14ba1
|
chore: bump v0.0.4 for sgl-kernel (#4223)
|
2025-03-08 23:01:59 -08:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|
Xiaoyu Zhang
|
79a321af55
|
revert pr 3628 to pass test_mla ci (#4219)
|
2025-03-08 21:15:14 -08:00 |
|
Xiaoyu Zhang
|
b3251e9f40
|
refine quant kernel code style (#4211)
|
2025-03-08 05:47:35 -08:00 |
|
Lianmin Zheng
|
8d323e95e4
|
Use clang format 18 in pr-test-sgl-kernel.yml (#4203)
|
2025-03-08 01:28:10 -08:00 |
|
Yineng Zhang
|
96d0e37fa7
|
Revert "Minor improvement to per_tensor_quant_fp8 (#4197)" (#4198)
|
2025-03-07 22:57:09 -08:00 |
|
Rex
|
90bb2be27e
|
Minor improvement to per_tensor_quant_fp8 (#4197)
|
2025-03-07 22:52:12 -08:00 |
|
lukec
|
b93ef5e56d
|
Remove the vllm dependency from the moe_align function (#4164)
Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>
|
2025-03-07 22:42:16 -08:00 |
|
Lianmin Zheng
|
d052f4c8a9
|
New clang format for sgl kernel (#4194)
|
2025-03-07 20:21:08 -08:00 |
|
Yineng Zhang
|
eb61f5c9af
|
Revert "ROCm: Flex Attention Enablement with custom backends (#4178)" (#4186)
|
2025-03-07 10:27:52 -08:00 |
|
HAI
|
0beea4503f
|
ROCm: Flex Attention Enablement with custom backends (#4178)
Co-authored-by: linsun12 <linsun12@amd.com>
|
2025-03-07 04:38:53 -08:00 |
|
Yineng Zhang
|
96263f275c
|
chore: bump v0.0.3.post7 for sgl-kernel (#4176)
|
2025-03-07 01:15:34 -08:00 |
|
Yineng Zhang
|
94a2b9d33e
|
Put utils in ifndef USE_ROCM to fix CI (#4167) (#4168)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-03-07 00:01:17 -08:00 |
|
Stefan He
|
3c3eb374b2
|
Remove non-existent AMD header include (#4166)
|
2025-03-06 23:29:30 -08:00 |
|
Stefan He
|
95085d65e9
|
[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163)
|
2025-03-06 22:58:52 -08:00 |
|
Stefan He
|
63ee26d162
|
Add sgl_per_token_quant_fp8 (#4089)
|
2025-03-06 20:53:05 -08:00 |
|
Xiaoyu Zhang
|
ad55f17182
|
[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786)
|
2025-03-06 18:05:43 -08:00 |
|
Liu Jinjie
|
0804dd11a0
|
remove unused max_jobs in setup_rocm.py (#4126)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
|
2025-03-06 00:12:19 -08:00 |
|
Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Liu Jinjie
|
926f8efc0c
|
remove unused max_jobs (#3607)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
|
2025-03-04 04:23:39 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
110e006673
|
Reorganize python source files in sgl-kernel with multiple files (#4027)
|
2025-03-03 06:36:40 -08:00 |
|
Lianmin Zheng
|
6b45a21d16
|
Reorganize c++ source files in sgl-kernel with multiple folders (#4025)
|
2025-03-03 05:32:30 -08:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Hubert Lu
|
9cf4077294
|
Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406)
|
2025-03-02 15:19:06 -08:00 |
|
Chayenne
|
18bb216c28
|
Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982)
|
2025-02-28 23:57:17 -08:00 |
|
Elfie Guo
|
9e74ee91da
|
Update cutlass dependency (#3966)
|
2025-02-28 16:16:31 -08:00 |
|
yiakwy-xpu-ml-framework-team
|
1c96fa86cf
|
[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613)
|
2025-02-27 19:42:48 -08:00 |
|
Xiaoyu Zhang
|
55a7ec388f
|
use warp shuffle style reduce and flashinfer vectorize (#3628)
|
2025-02-19 20:53:51 +08:00 |
|
Baizhou Zhang
|
67fc595bb8
|
[Feature] Apply Cublas Grouped Gemm kernel (#3629)
|
2025-02-18 15:18:31 +08:00 |
|
Xiaoyu Zhang
|
3efbdf68b9
|
fix sgl-kernel codestyle (#3563)
|
2025-02-14 18:05:52 +08:00 |
|
Yineng Zhang
|
e082142519
|
chore: bump 0.0.3.post6 sgl-kernel (#3555)
|
2025-02-14 08:55:15 +08:00 |
|
Xiaoyu Zhang
|
f076328bb7
|
fix moe_align_kernel shm init not sync bug (#3534)
|
2025-02-13 16:47:00 +08:00 |
|