Commit Graph

6 Commits

Author SHA1 Message Date
Stefan He
db7343c992 fix per token cuda kernel hidden dim cannot divide by 16 (#8543) 2025-08-01 09:27:18 -07:00
Zhaoyi Li
3c9740d200 update variable naming and comments for rocm (#5299) 2025-04-11 23:15:05 -07:00
Yineng Zhang
2937387a50 fix accuracy issue (#4376) 2025-03-13 02:06:22 -07:00
Qingquan Song
4068e01292 Fix per token fp8 quant precision (#4362) 2025-03-12 21:19:05 -07:00
Stefan He
95085d65e9 [Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163) 2025-03-06 22:58:52 -08:00
Stefan He
63ee26d162 Add sgl_per_token_quant_fp8 (#4089) 2025-03-06 20:53:05 -08:00