Commit Graph

158 Commits

Author SHA1 Message Date
Li Hui
69dd878b51 Fix shared experts fusion error (#6289) 2025-05-30 01:16:11 -07:00
Zilin Zhu
51cdd81f97 [fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight (#6265) 2025-05-29 16:28:10 -07:00
fzyzcjy
31589e177e Speed up when having padding tokens two-batch overlap (#6668)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-05-28 16:00:58 -07:00
fzyzcjy
541a985f85 Fuse routed_scaling_factor in DeepSeek (#6710) 2025-05-28 15:53:37 -07:00
HAI
183d9f969c DeepSeek: enable none block-quant FP8 quantizations (#6638) 2025-05-27 09:06:40 -07:00
fzyzcjy
32cd707002 Support TP in attention for two batch overlap (#6634) 2025-05-26 20:28:12 -07:00
fzyzcjy
0ca3e56802 Tiny fix missing expert location dispatch info (#6620) 2025-05-26 08:58:31 -07:00
Yi Zhang
65f091310c refactor qwen moe code, use communicator to support tp+dp (#6581) 2025-05-25 23:01:10 -07:00
fzyzcjy
0d47788025 Support overlapping two batches (#4068) 2025-05-24 17:39:07 -07:00
fzyzcjy
b2388433be Add back DeepSeek non-TBO branches (#6578) 2025-05-24 17:34:00 -07:00
fzyzcjy
a38376fa99 Refactor attention into multiple stages (#6477) 2025-05-24 17:33:25 -07:00
fzyzcjy
fc992a09f9 Support updating expert locations dynamically (#6388) 2025-05-21 21:59:33 -07:00
Baizhou Zhang
d4c038daed [Fix]Fix capture fail bug for DeepSeek (#6275) 2025-05-21 11:11:20 -07:00
fzyzcjy
ccfe5c009d Support redundant experts in expert parallel (#6461) 2025-05-21 02:05:53 -07:00
fzyzcjy
d6e1d28c8a Refactor DeepSeek attention dispatching (#6476) 2025-05-21 02:03:39 -07:00
Lianmin Zheng
03886917bd Disable all two stream overlap on amd (#6475) 2025-05-20 19:06:59 -07:00
fzyzcjy
13feffd082 Fix master CI for DeepSeek (#6447) 2025-05-20 00:31:42 -07:00
fzyzcjy
e98afbe042 Support dispatching logical to physical experts (#6385) 2025-05-19 22:13:55 -07:00
HAI
6317c5c61f Address performance regression: disable multiple streams on ROCm (#6412) 2025-05-19 21:16:20 -07:00
fzyzcjy
d0443275f0 Refactor DeepSeek logic into atomic operations (#6326) 2025-05-19 21:05:30 -07:00
fzyzcjy
1b19df4b2a Refactor communication logic of DeepSeek for extensibility and understandability (#6321) 2025-05-19 20:14:48 -07:00
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
fzyzcjy
72bfb0baf0 Refactor DeepSeek MoE layer to unify the two forward branches (#6325) 2025-05-18 15:34:36 -07:00
fzyzcjy
2716830802 Speed up when having padding tokens in DeepEP (#6175) 2025-05-17 16:44:05 -07:00
fzyzcjy
2df9d40aa6 Minor code cleanup refactor for DeepSeek models (#6324) 2025-05-16 19:06:03 -07:00
fzyzcjy
8dc191f237 Fix one wasted kernel in DeepSeek and minor refactor (#6316) 2025-05-16 19:05:33 -07:00
fzyzcjy
f194e14fb7 Reduce MoE memory usage (#6147) 2025-05-15 09:38:28 -07:00
Cheng Wan
b2e95f62b4 Fix two issues related to --moe-dense-tp-size=1 (#5657)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>
2025-05-12 23:51:39 -07:00
Cheng Wan
25c83fff6a Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
2025-05-11 23:36:29 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
JieXin Liang
c178abdabc [fix] fix determine_n_share_experts_fusion (#6118) 2025-05-10 01:19:09 -07:00
xu-yfei
e30c273bc9 opt flashinfer mla cat (#5822)
Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>
2025-05-08 23:17:14 -07:00
JieXin Liang
5e02330137 [perf] dsv3 bmm fallback to bf16 (#5662) 2025-05-08 11:43:39 -07:00
lukec
acc816d8a2 DeepEP normal support deepgemm-contiguous (#5626)
Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
Co-authored-by: ZhengHSI <zhenghsi@qq.com>
2025-05-08 01:20:32 -07:00
Baizhou Zhang
73600673bb Clean logs for DeepSeek-V3 launching (#6079) 2025-05-07 18:54:50 -07:00
JieXin Liang
b70957fcf8 [refactor] slightly tidy fp8 module (#5993) 2025-05-07 17:28:24 -07:00
Ke Bao
d8ab60117f Overlap qk norm with two streams (#5977) 2025-05-02 09:26:30 -07:00
Ke Bao
de2faef97e Remove extra contiguous (#5953) 2025-05-01 09:28:46 -07:00
liwenju0
8fefdd32c7 [Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-29 21:31:19 -07:00
Ke Bao
dd408ee481 Auto set draft model path for MTP (#5793) 2025-04-29 16:25:40 -07:00
Ke Bao
799c4bb502 Fuse MLA set kv cache kernel (#5748) 2025-04-26 18:42:22 -07:00
Ke Bao
c3948ba67e Reorder loop in shared expert weight loading (#5719) 2025-04-25 17:27:42 -07:00
Yuhong Guo
5d93a950ee [BugFix] Fix combination of MTP and --n-share-experts-fusionwith R1 (#5707) 2025-04-24 21:13:51 +08:00
fzyzcjy
71d1785f2d Remove unnecessary torch.full in DeepSeek (#5601) 2025-04-22 21:24:29 -07:00
Baizhou Zhang
3f87f83116 Fuse q_a_proj and kv_a_proj (#5619) 2025-04-22 20:35:08 -07:00
Ke Bao
6b6e748775 Remove q concat in FA3 backend for DeepSeek decode (#5638) 2025-04-22 11:43:12 -07:00
lambert0312
76d17c7ecb Fix shared experts fusion error without quantization (#5632) 2025-04-22 09:22:26 -07:00
JieXin Liang
4418f599a5 Fix FA3 DeepSeek prefill performance regression (#5624)
Co-authored-by: ispobock <ispobaoke@gmail.com>
2025-04-22 01:41:41 -07:00
JieXin Liang
506be6b892 [fix] fix compile_deep_gemm missing kv_b_proj (#5620) 2025-04-22 00:06:36 -07:00
Ke Bao
11b23ae97b Remove extra copy in deepseek forward absorb (#5578)
Co-authored-by: saienduri <saimanas.enduri@amd.com>
2025-04-21 19:33:21 -07:00