Commit Graph

9 Commits

Author SHA1 Message Date
Bowen Bao
cd4b39a900 [quantization] Properly ignore quantization for layers excluded in quant_config (#11205) 2025-10-07 14:06:05 -07:00
Bowen Bao
baee08601b [quantization] Enable aiter mxfp4 fused_moe for Quark (#10048)
Co-authored-by: HaiShaw <hixiao@gmail.com>
2025-10-05 19:51:34 -07:00
kk
8ebf72fef3 [Fix] RuntimeError: get_cfg Unsupported input_type:Float4_e2m1fn_x2 in using aiter-mxfp4-moe (#10981)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-09-26 22:12:22 -07:00
Cheng Wan
3fa62da78c [7/N] MoE Refactor: the implementation of new framework (#9269) 2025-09-05 21:09:09 -07:00
kk
e96973742c Optimized deepseek-v3/r1 model performance on mxfp4 run (#10008)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
2025-09-04 15:11:22 -07:00
Yineng Zhang
1b2ff4fb7f Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)" (#9959) 2025-09-03 00:50:04 -07:00
kk
0dfd54d11d Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: wghuang <wghuang@amd.com>
2025-09-02 22:26:28 -07:00
kk
1c1f8a118e Combine fp4.py and mxfp4.py into one file and support dynamic mxfp4 quantization in mxfp4.py (#9049)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-16 19:01:54 -07:00
kk
d4bf5a8524 Support OCP MXFP4 quantization on AMD GPUs (#8255)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-08-04 18:14:52 -07:00