Commit Graph

13 Commits

Author SHA1 Message Date
maxiao
852a49c5cc adapt to dsv32 on dcu 2025-09-30 18:37:31 +08:00
Xiaoyu Zhang
c4e314f986 Restruct sgl-kernel benchmark (#10861) 2025-09-25 07:45:25 +08:00
Even Zhou
b67c277f86 [Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph (#10013) 2025-09-07 21:50:49 -07:00
Cheng Wan
a5a03209e9 Fix circular import (#10107) 2025-09-06 01:34:17 -07:00
Cheng Wan
3fa62da78c [7/N] MoE Refactor: the implementation of new framework (#9269) 2025-09-05 21:09:09 -07:00
chenxj
d4a938417d [feat] Support tp mode for DeepSeek-R1-W4AFP8 (#8118)
Co-authored-by: yuhyao <827623970@qq.com>
2025-09-01 22:17:26 -07:00
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00
SijiaYang
90f44b74e6 fix: w4afp8 accuracy problem and rebase (#8752)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>
2025-08-11 13:41:19 -07:00
Yuan Luo
3b87a9e8ae Fix bug of refactoring TopKOutput in w4afp8 (#8745)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-08-03 20:05:02 -07:00
Cheng Wan
32fa1e9cc2 [4/N] MoE Refactor: Unified Triton Kernel for FusedMoE and EPMoE (#8515) 2025-07-31 02:34:02 -07:00
Cheng Wan
bf0f448fe5 [2/N] MoE Refactor: Unify weight loader and quant methods (#8397) 2025-07-27 01:00:21 -07:00
Cheng Wan
49b8777460 Refactor: move all quantization-related code to srt/layer/quantization (#7989) 2025-07-17 00:47:07 -07:00
SijiaYang
cb9d91ea8a feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
2025-07-07 14:47:21 -07:00