Commit Graph

170 Commits

Author SHA1 Message Date
Yineng Zhang
4ff1264201 Update pyproject.toml 2025-03-13 02:16:51 -07:00
Yineng Zhang
2a4cbad8e9 bump 0.0.5 sgl-kernel (#4377) 2025-03-13 02:08:35 -07:00
Yineng Zhang
2937387a50 fix accuracy issue (#4376) 2025-03-13 02:06:22 -07:00
Qingquan Song
4068e01292 Fix per token fp8 quant precision (#4362) 2025-03-12 21:19:05 -07:00
Shi Shuai
817d43705c feat: support ep size < 32 for sgl kernel (#4348) 2025-03-12 20:50:46 -07:00
Elfie Guo
7c86671131 Support Blackwell Block Scale FP8 Gemm (#4278) 2025-03-12 14:17:11 -07:00
Yineng Zhang
6e7239f912 release 0.0.4.post3 sgl-kernel (#4331) 2025-03-12 01:05:16 -07:00
Yineng Zhang
0a3960f21f fix awq_dequantize (#4333) 2025-03-12 01:04:38 -07:00
Rex
07f944631e Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104) 2025-03-12 00:10:02 -07:00
Stefan He
e0917e6bd0 Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
2025-03-12 00:08:03 -07:00
Xiaoyu Zhang
7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) 2025-03-11 22:48:38 -07:00
yigex
690e1f2371 [AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
2025-03-11 10:35:28 -07:00
Yineng Zhang
cd90945518 bump sgl-kernel 0.0.4.post2 (#4288) 2025-03-11 00:09:47 -07:00
Yineng Zhang
bde24ab31f update deepgemm (#4284) 2025-03-10 23:39:57 -07:00
Elfie Guo
bf2eefc0c7 Uupdate cutalss dependency for its bug fix (#4277) 2025-03-10 17:00:05 -07:00
Yineng Zhang
3dd4feae63 add THIRDPARTYNOTICES for DeepGEMM (#4272) 2025-03-10 11:10:57 -07:00
Lianmin Zheng
cf0ccd406e Optimize rope in sgl kernel (#4267) 2025-03-10 10:07:45 -07:00
Lianmin Zheng
1a5023e05d Release sgl-kernel v0.0.4.post1 (#4255) 2025-03-10 02:39:50 -07:00
Xiaoyu Zhang
23308a9032 fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231) 2025-03-10 01:42:58 -07:00
Lianmin Zheng
aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) 2025-03-10 01:24:22 -07:00
laixin
c553e1604c DeepGemm integrate to sgl-kernel (#4165)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-03-10 00:35:07 -07:00
Lianmin Zheng
7c0541b385 Move activation.cu to sgl-kernel/elementwise (#4250) 2025-03-09 22:41:13 -07:00
Lianmin Zheng
730d084f2a Minor style fix for sgl-kernel (#4243) 2025-03-09 20:15:13 -07:00
Lianmin Zheng
eb06dbcbf8 Move rope and bmm into sgl-kernel (#4241) 2025-03-09 18:38:15 -07:00
Yineng Zhang
df84ab2a5b update sgl-kernel 3rdparty (#4228) 2025-03-09 01:16:05 -08:00
Yineng Zhang
5c7dd14ba1 chore: bump v0.0.4 for sgl-kernel (#4223) 2025-03-08 23:01:59 -08:00
Lianmin Zheng
8abf74e3c9 Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-03-08 22:54:51 -08:00
Xiaoyu Zhang
79a321af55 revert pr 3628 to pass test_mla ci (#4219) 2025-03-08 21:15:14 -08:00
Xiaoyu Zhang
b3251e9f40 refine quant kernel code style (#4211) 2025-03-08 05:47:35 -08:00
Lianmin Zheng
8d323e95e4 Use clang format 18 in pr-test-sgl-kernel.yml (#4203) 2025-03-08 01:28:10 -08:00
Yineng Zhang
96d0e37fa7 Revert "Minor improvement to per_tensor_quant_fp8 (#4197)" (#4198) 2025-03-07 22:57:09 -08:00
Rex
90bb2be27e Minor improvement to per_tensor_quant_fp8 (#4197) 2025-03-07 22:52:12 -08:00
lukec
b93ef5e56d Remove the vllm dependency from the moe_align function (#4164)
Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>
2025-03-07 22:42:16 -08:00
Lianmin Zheng
d052f4c8a9 New clang format for sgl kernel (#4194) 2025-03-07 20:21:08 -08:00
Yineng Zhang
eb61f5c9af Revert "ROCm: Flex Attention Enablement with custom backends (#4178)" (#4186) 2025-03-07 10:27:52 -08:00
HAI
0beea4503f ROCm: Flex Attention Enablement with custom backends (#4178)
Co-authored-by: linsun12 <linsun12@amd.com>
2025-03-07 04:38:53 -08:00
Yineng Zhang
96263f275c chore: bump v0.0.3.post7 for sgl-kernel (#4176) 2025-03-07 01:15:34 -08:00
Yineng Zhang
94a2b9d33e Put utils in ifndef USE_ROCM to fix CI (#4167) (#4168)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-03-07 00:01:17 -08:00
Stefan He
3c3eb374b2 Remove non-existent AMD header include (#4166) 2025-03-06 23:29:30 -08:00
Stefan He
95085d65e9 [Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163) 2025-03-06 22:58:52 -08:00
Stefan He
63ee26d162 Add sgl_per_token_quant_fp8 (#4089) 2025-03-06 20:53:05 -08:00
Xiaoyu Zhang
ad55f17182 [quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786) 2025-03-06 18:05:43 -08:00
Liu Jinjie
0804dd11a0 remove unused max_jobs in setup_rocm.py (#4126)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-06 00:12:19 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
Liu Jinjie
926f8efc0c remove unused max_jobs (#3607)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-04 04:23:39 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
110e006673 Reorganize python source files in sgl-kernel with multiple files (#4027) 2025-03-03 06:36:40 -08:00
Lianmin Zheng
6b45a21d16 Reorganize c++ source files in sgl-kernel with multiple folders (#4025) 2025-03-03 05:32:30 -08:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00