Commit Graph

7 Commits

Author SHA1 Message Date
Cheng Wan
6c88f6c8d9 [5/N] MoE Refactor: Update MoE parallelism arguments (#8658) 2025-08-01 01:20:03 -07:00
Qiaolin Yu
41650b0d70 feat: support compatibility between MTP and two-batch-overlap (#7225)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-06-27 01:10:27 -07:00
fzyzcjy
4839999b76 Overlap two kernels in DeepSeek with communication (#6711) 2025-05-28 15:53:51 -07:00
Yi Zhang
f9bab3d591 qwen3moe support two batch overlap (#6598) 2025-05-25 23:08:16 -07:00
fzyzcjy
0d47788025 Support overlapping two batches (#4068) 2025-05-24 17:39:07 -07:00
fzyzcjy
a38376fa99 Refactor attention into multiple stages (#6477) 2025-05-24 17:33:25 -07:00
fzyzcjy
d0443275f0 Refactor DeepSeek logic into atomic operations (#6326) 2025-05-19 21:05:30 -07:00