Cheng Wan
|
f003cd3548
|
[CI] Fix CI tests (#9050)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-10 23:52:05 -07:00 |
|
Guanhua Wang
|
f7b2853ff8
|
[feat] support minimum token load balance in dp attention (#7379)
|
2025-08-03 00:46:47 -07:00 |
|
Cheng Wan
|
d487555f84
|
[CI] Add deepep tests to CI (#7872)
|
2025-07-09 01:49:47 -07:00 |
|
Qiaolin Yu
|
41650b0d70
|
feat: support compatibility between MTP and two-batch-overlap (#7225)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-06-27 01:10:27 -07:00 |
|
u4lr451
|
10d60cd41b
|
feat: mtp support dp-attention (#6081)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-06-17 00:33:28 -07:00 |
|
fzyzcjy
|
defede5073
|
Fix DeepSeek DP Attention + torch compile (#5367)
Co-authored-by: ispobock <ispobaoke@163.com>
|
2025-04-14 01:07:58 -07:00 |
|
Lianmin Zheng
|
4ede6770cd
|
Fix retract for page size > 1 (#4914)
|
2025-03-30 02:57:15 -07:00 |
|
fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
Cheng Wan
|
3196999f63
|
Reduce computation and communication in DP attention (#4521)
|
2025-03-18 13:41:36 -07:00 |
|
Lianmin Zheng
|
8e66fbecee
|
Improve DP attention (#4390)
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-03-13 08:23:56 -07:00 |
|
Lianmin Zheng
|
c76040e31b
|
Support page size > 1 (#4356)
|
2025-03-12 22:22:39 -07:00 |
|
Lianmin Zheng
|
d4fc1a70e3
|
Crash the server correctly during error (#2231)
|
2024-11-28 00:22:39 -08:00 |
|
Lianmin Zheng
|
f719d9aebc
|
Launch dp ranks in parallel (#2053)
Co-authored-by: Haotian Liu <6631389+haotian-liu@users.noreply.github.com>
|
2024-11-16 17:39:39 -08:00 |
|
Ke Bao
|
976bc302e5
|
Support DP MLA (#1970)
|
2024-11-16 09:01:43 +00:00 |
|