Commit Graph

195 Commits

Author SHA1 Message Date
yinfan98
4db29e82ec [Feat] support deepgemm for cmake (#4864) 2025-03-28 10:51:44 -07:00
Yineng Zhang
6dea5c96bf Revert "get the python version from env (#4729)" (#4863) 2025-03-28 08:07:48 -07:00
DavidChan
5eae67cb1f get the python version from env (#4729) 2025-03-27 22:26:42 -07:00
Yineng Zhang
31dfff7da7 use default for torch.ops (#4835) 2025-03-27 19:09:58 -07:00
Yineng Zhang
8bf6d7f406 support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-27 01:42:28 -07:00
Yi Pan
45fdf1f7f3 Fix shared memory OOM on sm86 GPUs. (#4797) 2025-03-26 10:41:53 -07:00
Trevor Morris
e9f8e42318 Support FP4 gemm (1/2) (#3899) 2025-03-24 19:50:23 -07:00
Chunan Zeng
65c24c28f9 [Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396) 2025-03-23 23:44:17 -07:00
Alex Sun
af6535e7aa [ROCm] Enable MTP (NextN) on AMD GPU (#4631) 2025-03-23 22:58:05 -07:00
AniZpZ
321ab756bc [1/3] fix dsv3 awq issue (#4556)
Co-authored-by: leoneo <1320612015@qq.com>
2025-03-22 01:07:17 -07:00
Chunan Zeng
6a384d5c01 Speed up per token and per tensor quant by 15% (#4639) 2025-03-22 00:37:57 -07:00
Shu Wang
ad4e58bf67 Support fp8 gemm for blackwell (#4558) 2025-03-20 12:40:28 -07:00
strgrb
f9c53cbb42 Create col-major and tma-aligned x_scale for deep_gemm.gemm_fp8_fp8_bf16_nt (#4515)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-03-19 00:02:43 -07:00
Yineng Zhang
988ab646ec bump v0.0.5.post3 (#4520) 2025-03-17 13:05:59 -07:00
Wenbo Yang
75b656488a Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418) 2025-03-17 00:03:43 -07:00
yiakwy-xpu-ml-framework-team
9b8333d992 [ROCm] enable moe topk softmax in amd (#4448) 2025-03-16 18:16:55 -07:00
Yi Zhang
25e1816eff fix custom allreduce performance/accuracy problem (#4477) 2025-03-16 12:16:30 -07:00
Ying Sheng
1b859295f4 [Eagle] Remove the greedy branch and some redundant code (#4363)
Co-authored-by: Sehoon Kim <sehoon@x.ai>
2025-03-16 02:48:55 -07:00
Yineng Zhang
9971dc2283 Revert "feat: Add FlashMLA submodule (#4449)" (#4470) 2025-03-16 01:30:05 -07:00
Lianmin Zheng
3db35c1af4 Release sgl-kernel v0.0.5.post2 (#4469) 2025-03-16 01:01:53 -07:00
Ying Sheng
52a34d7448 Add greedy verification kernel (#4383) 2025-03-16 00:58:26 -07:00
JieXin Liang
1a3fa75f2f [Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466) 2025-03-16 00:02:47 -07:00
Shi Shuai
81f431eded feat: Add FlashMLA submodule (#4449)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-03-15 23:30:25 -07:00
Yineng Zhang
862fe52241 bump v0.0.5.post1 (#4437) 2025-03-14 15:00:26 -07:00
Qingquan Song
61e4433caf Add moe topk softmax templated from vllm (#4302) 2025-03-14 12:03:33 -07:00
Yineng Zhang
4ff1264201 Update pyproject.toml 2025-03-13 02:16:51 -07:00
Yineng Zhang
2a4cbad8e9 bump 0.0.5 sgl-kernel (#4377) 2025-03-13 02:08:35 -07:00
Yineng Zhang
2937387a50 fix accuracy issue (#4376) 2025-03-13 02:06:22 -07:00
Qingquan Song
4068e01292 Fix per token fp8 quant precision (#4362) 2025-03-12 21:19:05 -07:00
Shi Shuai
817d43705c feat: support ep size < 32 for sgl kernel (#4348) 2025-03-12 20:50:46 -07:00
Elfie Guo
7c86671131 Support Blackwell Block Scale FP8 Gemm (#4278) 2025-03-12 14:17:11 -07:00
Yineng Zhang
6e7239f912 release 0.0.4.post3 sgl-kernel (#4331) 2025-03-12 01:05:16 -07:00
Yineng Zhang
0a3960f21f fix awq_dequantize (#4333) 2025-03-12 01:04:38 -07:00
Rex
07f944631e Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104) 2025-03-12 00:10:02 -07:00
Stefan He
e0917e6bd0 Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
2025-03-12 00:08:03 -07:00
Xiaoyu Zhang
7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) 2025-03-11 22:48:38 -07:00
yigex
690e1f2371 [AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
2025-03-11 10:35:28 -07:00
Yineng Zhang
cd90945518 bump sgl-kernel 0.0.4.post2 (#4288) 2025-03-11 00:09:47 -07:00
Yineng Zhang
bde24ab31f update deepgemm (#4284) 2025-03-10 23:39:57 -07:00
Elfie Guo
bf2eefc0c7 Uupdate cutalss dependency for its bug fix (#4277) 2025-03-10 17:00:05 -07:00
Yineng Zhang
3dd4feae63 add THIRDPARTYNOTICES for DeepGEMM (#4272) 2025-03-10 11:10:57 -07:00
Lianmin Zheng
cf0ccd406e Optimize rope in sgl kernel (#4267) 2025-03-10 10:07:45 -07:00
Lianmin Zheng
1a5023e05d Release sgl-kernel v0.0.4.post1 (#4255) 2025-03-10 02:39:50 -07:00
Xiaoyu Zhang
23308a9032 fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231) 2025-03-10 01:42:58 -07:00
Lianmin Zheng
aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) 2025-03-10 01:24:22 -07:00
laixin
c553e1604c DeepGemm integrate to sgl-kernel (#4165)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-03-10 00:35:07 -07:00
Lianmin Zheng
7c0541b385 Move activation.cu to sgl-kernel/elementwise (#4250) 2025-03-09 22:41:13 -07:00
Lianmin Zheng
730d084f2a Minor style fix for sgl-kernel (#4243) 2025-03-09 20:15:13 -07:00
Lianmin Zheng
eb06dbcbf8 Move rope and bmm into sgl-kernel (#4241) 2025-03-09 18:38:15 -07:00
Yineng Zhang
df84ab2a5b update sgl-kernel 3rdparty (#4228) 2025-03-09 01:16:05 -08:00