Commit Graph

5 Commits

Author SHA1 Message Date
Yuan Luo
3b87a9e8ae Fix bug of refactoring TopKOutput in w4afp8 (#8745)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-08-03 20:05:02 -07:00
Cheng Wan
32fa1e9cc2 [4/N] MoE Refactor: Unified Triton Kernel for FusedMoE and EPMoE (#8515) 2025-07-31 02:34:02 -07:00
Cheng Wan
bf0f448fe5 [2/N] MoE Refactor: Unify weight loader and quant methods (#8397) 2025-07-27 01:00:21 -07:00
Cheng Wan
49b8777460 Refactor: move all quantization-related code to srt/layer/quantization (#7989) 2025-07-17 00:47:07 -07:00
SijiaYang
cb9d91ea8a feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
2025-07-07 14:47:21 -07:00