Cheng Wan
|
6c88f6c8d9
|
[5/N] MoE Refactor: Update MoE parallelism arguments (#8658)
|
2025-08-01 01:20:03 -07:00 |
|
Qiaolin Yu
|
41650b0d70
|
feat: support compatibility between MTP and two-batch-overlap (#7225)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-06-27 01:10:27 -07:00 |
|
fzyzcjy
|
4839999b76
|
Overlap two kernels in DeepSeek with communication (#6711)
|
2025-05-28 15:53:51 -07:00 |
|
Yi Zhang
|
f9bab3d591
|
qwen3moe support two batch overlap (#6598)
|
2025-05-25 23:08:16 -07:00 |
|
fzyzcjy
|
0d47788025
|
Support overlapping two batches (#4068)
|
2025-05-24 17:39:07 -07:00 |
|
fzyzcjy
|
a38376fa99
|
Refactor attention into multiple stages (#6477)
|
2025-05-24 17:33:25 -07:00 |
|
fzyzcjy
|
d0443275f0
|
Refactor DeepSeek logic into atomic operations (#6326)
|
2025-05-19 21:05:30 -07:00 |
|