Commit Graph

9 Commits

Author SHA1 Message Date
Yineng Zhang
2937387a50 fix accuracy issue (#4376) 2025-03-13 02:06:22 -07:00
Qingquan Song
4068e01292 Fix per token fp8 quant precision (#4362) 2025-03-12 21:19:05 -07:00
Elfie Guo
7c86671131 Support Blackwell Block Scale FP8 Gemm (#4278) 2025-03-12 14:17:11 -07:00
Rex
07f944631e Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104) 2025-03-12 00:10:02 -07:00
Stefan He
e0917e6bd0 Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
2025-03-12 00:08:03 -07:00
Xiaoyu Zhang
23308a9032 fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231) 2025-03-10 01:42:58 -07:00
Lianmin Zheng
aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) 2025-03-10 01:24:22 -07:00
Lianmin Zheng
eb06dbcbf8 Move rope and bmm into sgl-kernel (#4241) 2025-03-09 18:38:15 -07:00
Lianmin Zheng
8abf74e3c9 Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-03-08 22:54:51 -08:00