Jimmy
|
56b991b12d
|
[Feature]feat(get_ip): unify get_ip_xxx (#10081)
|
2025-09-18 22:35:26 -07:00 |
|
Lianmin Zheng
|
956d805dde
|
[Auto Sync] Update parallel_state.py (20250911) (#10326)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-09-11 06:36:29 -07:00 |
|
Lianmin Zheng
|
4582931ac3
|
Revert "Revert the changes on NCCL symmetric memory" (#10238)
|
2025-09-09 12:11:49 -07:00 |
|
Lianmin Zheng
|
d352c29aa0
|
Revert the changes on NCCL symmetric memory (#10210)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-09-09 11:01:33 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
Lianmin Zheng
|
617aa2b248
|
[Auto Sync] Update parallel_state.py (20250907) (#10126)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jzhou-xai <jzhou@x.ai>
|
2025-09-07 02:12:32 -07:00 |
|
Cheng Wan
|
453511acc7
|
Save memory for expert model parallel (#9957)
|
2025-09-04 13:31:47 -07:00 |
|
Lianmin Zheng
|
397448ebbc
|
[Auto Sync] Update parallel_state.py, few_shot_gsm8k.py (20250903) (#9986)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Leon Gao <leon.gao19@gmail.com>
|
2025-09-03 16:55:43 -07:00 |
|
Xiaoyu Zhang
|
a1e5d78115
|
fix parallel_state.py current_platform bug (#9919)
|
2025-09-02 03:17:15 -07:00 |
|
Lianmin Zheng
|
1e61b4960f
|
[Auto Sync] Update parallel_state.py (20250830) (#9828)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-30 14:25:39 -07:00 |
|
fzyzcjy
|
2600fc0d47
|
Overlapped weight offload (#8034)
|
2025-08-23 02:06:46 -07:00 |
|
VDV1985
|
2c4b4b786b
|
[feature] Ascend NPU graph support (#9399)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
|
2025-08-20 21:13:27 -07:00 |
|
Even Zhou
|
de2dd73831
|
Revert "[feature] Rework Ascend NPU graph support" (#9385)
|
2025-08-20 00:35:10 -07:00 |
|
Even Zhou
|
3680d6f88b
|
[feature] Rework Ascend NPU graph support (#9350)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
|
2025-08-19 20:32:27 -07:00 |
|
Even Zhou
|
f4fafacc5d
|
Revert "[feature] Ascend NPU graph support (#8027)" (#9348)
|
2025-08-19 10:11:23 -07:00 |
|
VDV1985
|
94371dbbd6
|
[feature] Ascend NPU graph support (#8027)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
|
2025-08-16 17:25:17 -07:00 |
|
Trevor Morris
|
eff4eb3fdd
|
Add fp4 quantize before all-gather for Flashinfer cutlass MoE DP (max throughput) (#7667)
|
2025-08-15 22:08:11 -07:00 |
|
kk
|
983aa4967b
|
Fix nan value generated after custom all reduce (#8663)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-08-15 12:33:54 -07:00 |
|
Even Zhou
|
137e75daa1
|
[Feature] Optimize DeepSeek's DeepEP on Ascend NPU (#8355)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: Hexq0210 <hexq0809521@gmail.com>
|
2025-08-09 01:35:00 -07:00 |
|
blzheng
|
62f8eb48b1
|
[CPU] Fix fallback allgather issue (#8041)
|
2025-08-07 00:08:18 -07:00 |
|
Nicolas Castet
|
82e6c3a65a
|
Add support for NCCL symmetric memory for TP allreduces (#8238)
|
2025-08-01 23:30:55 +00:00 |
|
Yineng Zhang
|
0ad098b494
|
Revert "Fix nan value generated after custom all reduce (#8532)" (#8642)
|
2025-07-31 17:26:49 -07:00 |
|
kk
|
4a6e7a66a0
|
Fix nan value generated after custom all reduce (#8532)
|
2025-07-31 16:15:43 -07:00 |
|
Cheng Wan
|
7a1f7fc504
|
[Feature] Hybrid EP and TP (#8590)
|
2025-07-31 02:53:25 -07:00 |
|
Stepan Kargaltsev
|
1b9cea5ade
|
[P/D] Support ipv6 in P/D scenario (#7858)
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-07-25 08:53:30 -07:00 |
|
Cheng Wan
|
c0fb25e949
|
DP Enhancement (#8280)
|
2025-07-24 21:36:21 -07:00 |
|
li haoyang
|
28d4d47280
|
[Feature] Integrate quick allreduce and select the best allreduce implementation (#6619)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-24 20:48:42 -07:00 |
|
Sai Enduri
|
d0510f08fe
|
Revert "Fix different device type adjustment in PP" (#8141)
|
2025-07-18 01:12:11 -07:00 |
|
Qiaolin Yu
|
3bc43c683e
|
Fix different device type adjustment in PP (#7760)
|
2025-07-15 19:37:14 -07:00 |
|
ykcombat
|
d4d0c7c367
|
[Feature]TP Group Switching for PD-Multiplexing (#7653)
|
2025-07-15 02:35:46 +08:00 |
|
TianyuZhang1214
|
0099172327
|
feat: use D2D instead of H2H in pp (#7673)
Co-authored-by: alpha-baby <fujianhao1997@qq.com>
|
2025-07-03 10:58:50 -07:00 |
|
Chunyuan WU
|
8f844db699
|
[CPU] fix all_reduce and all_gather (#6770)
Co-authored-by: blzheng <beilei.zheng@intel.com>
|
2025-07-02 22:39:45 -07:00 |
|
Cheng Wan
|
8609e637a9
|
Fix All-Gather under world size one (#7219)
|
2025-06-20 14:57:34 -07:00 |
|
zyksir
|
8e3797be1c
|
support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277)
|
2025-06-04 22:11:24 -07:00 |
|
Baizhou Zhang
|
bdaefbbfbd
|
Add environment flag for disabling message queue broadcaster (#6403)
|
2025-05-26 22:32:41 -07:00 |
|
Lifu Huang
|
3cf1473a09
|
Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-17 16:49:18 -07:00 |
|
Lianmin Zheng
|
e8e18dcdcc
|
Revert "fix some typos" (#6244)
|
2025-05-12 12:53:26 -07:00 |
|
applesaucethebun
|
d738ab52f8
|
fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-13 01:42:38 +08:00 |
|
Hubert Lu
|
2a936a841e
|
[AMD] switch to custom allreduce regardless of MSCCL setting on ROCm (#6097)
|
2025-05-08 13:46:58 -07:00 |
|
Baizhou Zhang
|
73600673bb
|
Clean logs for DeepSeek-V3 launching (#6079)
|
2025-05-07 18:54:50 -07:00 |
|
Song Zhang
|
00c2c1f08b
|
[Feature] Support for Ascend NPU backend (#3853)
Signed-off-by: Song Zhang <gepin.zs@antgroup.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
|
2025-05-06 20:32:53 -07:00 |
|
Adarsh Shirawalmath
|
683707c314
|
[Security][Bug] Prevent binding to all TCP interfaces (#5752)
|
2025-05-06 03:21:45 +08:00 |
|
Lianmin Zheng
|
9c088829ee
|
Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (#5786)
|
2025-04-27 04:03:02 -07:00 |
|
Wenxuan Tan
|
dfb322642f
|
Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (#5728)
|
2025-04-26 18:11:09 -07:00 |
|
Yi Zhang
|
aba5ca154d
|
python transfer custom allreduce from trt kernel to vllm kernel (#5080)
|
2025-04-05 15:35:55 -07:00 |
|
JieXin Liang
|
a995a773a0
|
[fix] remove cuda_device_count_stateless (#5060)
|
2025-04-04 00:18:26 -07:00 |
|
Kebe
|
10a9ab7b07
|
Fix error due to CustomAllreduce setup failure (#4815)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-03-27 18:52:10 -07:00 |
|
tarinkk
|
7f19e083c1
|
Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
|
2025-03-27 17:09:35 -07:00 |
|
Xiaoyu Zhang
|
04e3ff6975
|
Support compressed tensors fp8w8a8 (#4743)
|
2025-03-26 13:21:25 -07:00 |
|
Cheng Wan
|
3196999f63
|
Reduce computation and communication in DP attention (#4521)
|
2025-03-18 13:41:36 -07:00 |
|