chenxj
|
d4a938417d
|
[feat] Support tp mode for DeepSeek-R1-W4AFP8 (#8118)
Co-authored-by: yuhyao <827623970@qq.com>
|
2025-09-01 22:17:26 -07:00 |
|
Cheng Wan
|
295895120d
|
[6/N] MoE Refactor: Cleanup MoE-related configs (#8849)
|
2025-08-14 21:14:53 -07:00 |
|
SijiaYang
|
90f44b74e6
|
fix: w4afp8 accuracy problem and rebase (#8752)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>
|
2025-08-11 13:41:19 -07:00 |
|
Yuan Luo
|
3b87a9e8ae
|
Fix bug of refactoring TopKOutput in w4afp8 (#8745)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-08-03 20:05:02 -07:00 |
|
Cheng Wan
|
32fa1e9cc2
|
[4/N] MoE Refactor: Unified Triton Kernel for FusedMoE and EPMoE (#8515)
|
2025-07-31 02:34:02 -07:00 |
|
Cheng Wan
|
bf0f448fe5
|
[2/N] MoE Refactor: Unify weight loader and quant methods (#8397)
|
2025-07-27 01:00:21 -07:00 |
|
Cheng Wan
|
49b8777460
|
Refactor: move all quantization-related code to srt/layer/quantization (#7989)
|
2025-07-17 00:47:07 -07:00 |
|
SijiaYang
|
cb9d91ea8a
|
feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
|
2025-07-07 14:47:21 -07:00 |
|