Yineng Zhang
|
2937387a50
|
fix accuracy issue (#4376)
|
2025-03-13 02:06:22 -07:00 |
|
Qingquan Song
|
4068e01292
|
Fix per token fp8 quant precision (#4362)
|
2025-03-12 21:19:05 -07:00 |
|
Shi Shuai
|
817d43705c
|
feat: support ep size < 32 for sgl kernel (#4348)
|
2025-03-12 20:50:46 -07:00 |
|
Elfie Guo
|
7c86671131
|
Support Blackwell Block Scale FP8 Gemm (#4278)
|
2025-03-12 14:17:11 -07:00 |
|
Rex
|
07f944631e
|
Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104)
|
2025-03-12 00:10:02 -07:00 |
|
Stefan He
|
e0917e6bd0
|
Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
|
2025-03-12 00:08:03 -07:00 |
|
yigex
|
690e1f2371
|
[AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
|
2025-03-11 10:35:28 -07:00 |
|
Lianmin Zheng
|
cf0ccd406e
|
Optimize rope in sgl kernel (#4267)
|
2025-03-10 10:07:45 -07:00 |
|
Xiaoyu Zhang
|
23308a9032
|
fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231)
|
2025-03-10 01:42:58 -07:00 |
|
Lianmin Zheng
|
aa957102a9
|
Simplify tests & Fix trtllm custom allreduce registration (#4252)
|
2025-03-10 01:24:22 -07:00 |
|
Lianmin Zheng
|
7c0541b385
|
Move activation.cu to sgl-kernel/elementwise (#4250)
|
2025-03-09 22:41:13 -07:00 |
|
Lianmin Zheng
|
730d084f2a
|
Minor style fix for sgl-kernel (#4243)
|
2025-03-09 20:15:13 -07:00 |
|
Lianmin Zheng
|
eb06dbcbf8
|
Move rope and bmm into sgl-kernel (#4241)
|
2025-03-09 18:38:15 -07:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|