chenxu140
|
01d47a27b6
|
[Bugfix] fix kv buffer register & dp attention & deepepmoe (#9327)
|
2025-08-19 10:09:48 -07:00 |
|
Trevor Morris
|
eff4eb3fdd
|
Add fp4 quantize before all-gather for Flashinfer cutlass MoE DP (max throughput) (#7667)
|
2025-08-15 22:08:11 -07:00 |
|
Cheng Wan
|
b87aacb5c5
|
[DP Attention] Refactor: adding some utility functions (#9136)
|
2025-08-13 21:08:06 -07:00 |
|
Huaixin Chang
|
98457c0453
|
[Bugfix] Avoid unnecessary reduce-scatter call in prepare_mlp (#9169)
|
2025-08-13 21:04:41 -07:00 |
|
Cheng Wan
|
fd7e15b76d
|
Revert "[bug fix] Ensure local token and global token buffers are pointing to different storage " (#8993)
|
2025-08-08 21:34:17 -07:00 |
|
Elfie Guo
|
92cbef59ec
|
[bug fix] Ensure local token and global token buffers are pointing to different storage (#8785)
|
2025-08-08 15:13:32 -07:00 |
|
Trevor Morris
|
c0e84297c2
|
Use reduce scatter for DP (#8539)
|
2025-08-06 16:21:26 -07:00 |
|
Cheng Wan
|
c0fb25e949
|
DP Enhancement (#8280)
|
2025-07-24 21:36:21 -07:00 |
|
Cheng Wan
|
6c903611ca
|
Fix incorrect spec_num_draft_tokens in draft_extend (#7757)
|
2025-07-05 02:18:16 -07:00 |
|
Sheng Qi
|
cfe2edac38
|
[BUG] fix local_rank in initialize_dp_attention (#7584)
|
2025-06-27 20:01:01 -07:00 |
|
Atream
|
02bf31ef29
|
[fix] PD disaggregation when enable mtp and tp!=dp (#7420)
|
2025-06-21 12:03:11 -07:00 |
|
Cheng Wan
|
e879d8b7a8
|
[Feature] Comprehensive Hybrid Parallelism Support (#6389)
|
2025-06-20 14:43:11 -07:00 |
|
u4lr451
|
10d60cd41b
|
feat: mtp support dp-attention (#6081)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-06-17 00:33:28 -07:00 |
|
zyksir
|
8e3797be1c
|
support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277)
|
2025-06-04 22:11:24 -07:00 |
|
Fr4nk1in
|
4bd2952a37
|
feat: add dp attention support for Qwen 2/3 MoE models, fixes #6088 (#6121)
Co-authored-by: King.Zevin <zevin@mail.ustc.edu.cn>
Co-authored-by: Yi Zhang <1109276519@qq.com>
|
2025-05-16 14:44:10 -07:00 |
|
Cheng Wan
|
b2e95f62b4
|
Fix two issues related to --moe-dense-tp-size=1 (#5657)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>
|
2025-05-12 23:51:39 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
Cheng Wan
|
25c83fff6a
|
Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
|
2025-05-11 23:36:29 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Ying Sheng
|
11383cec3c
|
[PP] Add pipeline parallelism (#5724)
|
2025-04-30 18:18:07 -07:00 |
|
lukec
|
417b44eba8
|
[Feat] upgrade pytorch2.6 (#5417)
|
2025-04-20 16:06:34 -07:00 |
|
fzyzcjy
|
defede5073
|
Fix DeepSeek DP Attention + torch compile (#5367)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-14 01:07:58 -07:00 |
|
tarinkk
|
7f19e083c1
|
Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
|
2025-03-27 17:09:35 -07:00 |
|
Cheng Wan
|
3196999f63
|
Reduce computation and communication in DP attention (#4521)
|
2025-03-18 13:41:36 -07:00 |
|
Lianmin Zheng
|
5493c3343e
|
Fix data parallel + tensor parallel (#4499)
|
2025-03-17 05:13:16 -07:00 |
|
Lianmin Zheng
|
8e66fbecee
|
Improve DP attention (#4390)
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-03-13 08:23:56 -07:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Lianmin Zheng
|
1dda8c5e4c
|
Return more infos for computing average acceptance length (#3152)
|
2025-01-26 04:51:54 -08:00 |
|
Yineng Zhang
|
5dc54f1a62
|
feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2025-01-17 22:31:51 +08:00 |
|
Lianmin Zheng
|
8b6ce52e92
|
Support multi-node DP attention (#2925)
Co-authored-by: dhou-xai <dhou@x.ai>
|
2025-01-16 11:15:00 -08:00 |
|