Li Hui
|
69dd878b51
|
Fix shared experts fusion error (#6289)
|
2025-05-30 01:16:11 -07:00 |
|
Zilin Zhu
|
51cdd81f97
|
[fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight (#6265)
|
2025-05-29 16:28:10 -07:00 |
|
fzyzcjy
|
31589e177e
|
Speed up when having padding tokens two-batch overlap (#6668)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-05-28 16:00:58 -07:00 |
|
fzyzcjy
|
541a985f85
|
Fuse routed_scaling_factor in DeepSeek (#6710)
|
2025-05-28 15:53:37 -07:00 |
|
HAI
|
183d9f969c
|
DeepSeek: enable none block-quant FP8 quantizations (#6638)
|
2025-05-27 09:06:40 -07:00 |
|
fzyzcjy
|
32cd707002
|
Support TP in attention for two batch overlap (#6634)
|
2025-05-26 20:28:12 -07:00 |
|
fzyzcjy
|
0ca3e56802
|
Tiny fix missing expert location dispatch info (#6620)
|
2025-05-26 08:58:31 -07:00 |
|
Yi Zhang
|
65f091310c
|
refactor qwen moe code, use communicator to support tp+dp (#6581)
|
2025-05-25 23:01:10 -07:00 |
|
fzyzcjy
|
0d47788025
|
Support overlapping two batches (#4068)
|
2025-05-24 17:39:07 -07:00 |
|
fzyzcjy
|
b2388433be
|
Add back DeepSeek non-TBO branches (#6578)
|
2025-05-24 17:34:00 -07:00 |
|
fzyzcjy
|
a38376fa99
|
Refactor attention into multiple stages (#6477)
|
2025-05-24 17:33:25 -07:00 |
|
fzyzcjy
|
fc992a09f9
|
Support updating expert locations dynamically (#6388)
|
2025-05-21 21:59:33 -07:00 |
|
Baizhou Zhang
|
d4c038daed
|
[Fix]Fix capture fail bug for DeepSeek (#6275)
|
2025-05-21 11:11:20 -07:00 |
|
fzyzcjy
|
ccfe5c009d
|
Support redundant experts in expert parallel (#6461)
|
2025-05-21 02:05:53 -07:00 |
|
fzyzcjy
|
d6e1d28c8a
|
Refactor DeepSeek attention dispatching (#6476)
|
2025-05-21 02:03:39 -07:00 |
|
Lianmin Zheng
|
03886917bd
|
Disable all two stream overlap on amd (#6475)
|
2025-05-20 19:06:59 -07:00 |
|
fzyzcjy
|
13feffd082
|
Fix master CI for DeepSeek (#6447)
|
2025-05-20 00:31:42 -07:00 |
|
fzyzcjy
|
e98afbe042
|
Support dispatching logical to physical experts (#6385)
|
2025-05-19 22:13:55 -07:00 |
|
HAI
|
6317c5c61f
|
Address performance regression: disable multiple streams on ROCm (#6412)
|
2025-05-19 21:16:20 -07:00 |
|
fzyzcjy
|
d0443275f0
|
Refactor DeepSeek logic into atomic operations (#6326)
|
2025-05-19 21:05:30 -07:00 |
|
fzyzcjy
|
1b19df4b2a
|
Refactor communication logic of DeepSeek for extensibility and understandability (#6321)
|
2025-05-19 20:14:48 -07:00 |
|
fzyzcjy
|
f0653886a5
|
Expert distribution recording without overhead for EPLB (#4957)
|
2025-05-19 20:07:43 -07:00 |
|
fzyzcjy
|
72bfb0baf0
|
Refactor DeepSeek MoE layer to unify the two forward branches (#6325)
|
2025-05-18 15:34:36 -07:00 |
|
fzyzcjy
|
2716830802
|
Speed up when having padding tokens in DeepEP (#6175)
|
2025-05-17 16:44:05 -07:00 |
|
fzyzcjy
|
2df9d40aa6
|
Minor code cleanup refactor for DeepSeek models (#6324)
|
2025-05-16 19:06:03 -07:00 |
|
fzyzcjy
|
8dc191f237
|
Fix one wasted kernel in DeepSeek and minor refactor (#6316)
|
2025-05-16 19:05:33 -07:00 |
|
fzyzcjy
|
f194e14fb7
|
Reduce MoE memory usage (#6147)
|
2025-05-15 09:38:28 -07:00 |
|
Cheng Wan
|
b2e95f62b4
|
Fix two issues related to --moe-dense-tp-size=1 (#5657)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>
|
2025-05-12 23:51:39 -07:00 |
|
Cheng Wan
|
25c83fff6a
|
Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
|
2025-05-11 23:36:29 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
JieXin Liang
|
c178abdabc
|
[fix] fix determine_n_share_experts_fusion (#6118)
|
2025-05-10 01:19:09 -07:00 |
|
xu-yfei
|
e30c273bc9
|
opt flashinfer mla cat (#5822)
Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>
|
2025-05-08 23:17:14 -07:00 |
|
JieXin Liang
|
5e02330137
|
[perf] dsv3 bmm fallback to bf16 (#5662)
|
2025-05-08 11:43:39 -07:00 |
|
lukec
|
acc816d8a2
|
DeepEP normal support deepgemm-contiguous (#5626)
Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
Co-authored-by: ZhengHSI <zhenghsi@qq.com>
|
2025-05-08 01:20:32 -07:00 |
|
Baizhou Zhang
|
73600673bb
|
Clean logs for DeepSeek-V3 launching (#6079)
|
2025-05-07 18:54:50 -07:00 |
|
JieXin Liang
|
b70957fcf8
|
[refactor] slightly tidy fp8 module (#5993)
|
2025-05-07 17:28:24 -07:00 |
|
Ke Bao
|
d8ab60117f
|
Overlap qk norm with two streams (#5977)
|
2025-05-02 09:26:30 -07:00 |
|
Ke Bao
|
de2faef97e
|
Remove extra contiguous (#5953)
|
2025-05-01 09:28:46 -07:00 |
|
liwenju0
|
8fefdd32c7
|
[Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
|
2025-04-29 21:31:19 -07:00 |
|
Ke Bao
|
dd408ee481
|
Auto set draft model path for MTP (#5793)
|
2025-04-29 16:25:40 -07:00 |
|
Ke Bao
|
799c4bb502
|
Fuse MLA set kv cache kernel (#5748)
|
2025-04-26 18:42:22 -07:00 |
|
Ke Bao
|
c3948ba67e
|
Reorder loop in shared expert weight loading (#5719)
|
2025-04-25 17:27:42 -07:00 |
|
Yuhong Guo
|
5d93a950ee
|
[BugFix] Fix combination of MTP and --n-share-experts-fusionwith R1 (#5707)
|
2025-04-24 21:13:51 +08:00 |
|
fzyzcjy
|
71d1785f2d
|
Remove unnecessary torch.full in DeepSeek (#5601)
|
2025-04-22 21:24:29 -07:00 |
|
Baizhou Zhang
|
3f87f83116
|
Fuse q_a_proj and kv_a_proj (#5619)
|
2025-04-22 20:35:08 -07:00 |
|
Ke Bao
|
6b6e748775
|
Remove q concat in FA3 backend for DeepSeek decode (#5638)
|
2025-04-22 11:43:12 -07:00 |
|
lambert0312
|
76d17c7ecb
|
Fix shared experts fusion error without quantization (#5632)
|
2025-04-22 09:22:26 -07:00 |
|
JieXin Liang
|
4418f599a5
|
Fix FA3 DeepSeek prefill performance regression (#5624)
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2025-04-22 01:41:41 -07:00 |
|
JieXin Liang
|
506be6b892
|
[fix] fix compile_deep_gemm missing kv_b_proj (#5620)
|
2025-04-22 00:06:36 -07:00 |
|
Ke Bao
|
11b23ae97b
|
Remove extra copy in deepseek forward absorb (#5578)
Co-authored-by: saienduri <saimanas.enduri@amd.com>
|
2025-04-21 19:33:21 -07:00 |
|