Commit Graph

3 Commits

Author SHA1 Message Date
Stefan He
95085d65e9 [Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163) 2025-03-06 22:58:52 -08:00
Xiaoyu Zhang
55a7ec388f use warp shuffle style reduce and flashinfer vectorize (#3628) 2025-02-19 20:53:51 +08:00
Xiaoyu Zhang
bb418ced80 optimize per token group quant fp8 (#3490) 2025-02-11 22:19:05 +08:00