Yuan Luo
|
3b87a9e8ae
|
Fix bug of refactoring TopKOutput in w4afp8 (#8745)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-08-03 20:05:02 -07:00 |
|
Cheng Wan
|
32fa1e9cc2
|
[4/N] MoE Refactor: Unified Triton Kernel for FusedMoE and EPMoE (#8515)
|
2025-07-31 02:34:02 -07:00 |
|
Cheng Wan
|
bf0f448fe5
|
[2/N] MoE Refactor: Unify weight loader and quant methods (#8397)
|
2025-07-27 01:00:21 -07:00 |
|
Cheng Wan
|
49b8777460
|
Refactor: move all quantization-related code to srt/layer/quantization (#7989)
|
2025-07-17 00:47:07 -07:00 |
|
SijiaYang
|
cb9d91ea8a
|
feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)
Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
|
2025-07-07 14:47:21 -07:00 |
|