Commit Graph

66 Commits

Author SHA1 Message Date
Jimmy
56b991b12d [Feature]feat(get_ip): unify get_ip_xxx (#10081) 2025-09-18 22:35:26 -07:00
Lianmin Zheng
956d805dde [Auto Sync] Update parallel_state.py (20250911) (#10326)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-09-11 06:36:29 -07:00
Lianmin Zheng
4582931ac3 Revert "Revert the changes on NCCL symmetric memory" (#10238) 2025-09-09 12:11:49 -07:00
Lianmin Zheng
d352c29aa0 Revert the changes on NCCL symmetric memory (#10210)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-09-09 11:01:33 -07:00
Cao E
7577f0e40f Add graph runner support with torch compile on CPU (#7843) 2025-09-07 21:33:58 -07:00
Lianmin Zheng
617aa2b248 [Auto Sync] Update parallel_state.py (20250907) (#10126)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jzhou-xai <jzhou@x.ai>
2025-09-07 02:12:32 -07:00
Cheng Wan
453511acc7 Save memory for expert model parallel (#9957) 2025-09-04 13:31:47 -07:00
Lianmin Zheng
397448ebbc [Auto Sync] Update parallel_state.py, few_shot_gsm8k.py (20250903) (#9986)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Leon Gao <leon.gao19@gmail.com>
2025-09-03 16:55:43 -07:00
Xiaoyu Zhang
a1e5d78115 fix parallel_state.py current_platform bug (#9919) 2025-09-02 03:17:15 -07:00
Lianmin Zheng
1e61b4960f [Auto Sync] Update parallel_state.py (20250830) (#9828)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-30 14:25:39 -07:00
fzyzcjy
2600fc0d47 Overlapped weight offload (#8034) 2025-08-23 02:06:46 -07:00
VDV1985
2c4b4b786b [feature] Ascend NPU graph support (#9399)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-20 21:13:27 -07:00
Even Zhou
de2dd73831 Revert "[feature] Rework Ascend NPU graph support" (#9385) 2025-08-20 00:35:10 -07:00
Even Zhou
3680d6f88b [feature] Rework Ascend NPU graph support (#9350)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-19 20:32:27 -07:00
Even Zhou
f4fafacc5d Revert "[feature] Ascend NPU graph support (#8027)" (#9348) 2025-08-19 10:11:23 -07:00
VDV1985
94371dbbd6 [feature] Ascend NPU graph support (#8027)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-16 17:25:17 -07:00
Trevor Morris
eff4eb3fdd Add fp4 quantize before all-gather for Flashinfer cutlass MoE DP (max throughput) (#7667) 2025-08-15 22:08:11 -07:00
kk
983aa4967b Fix nan value generated after custom all reduce (#8663)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-15 12:33:54 -07:00
Even Zhou
137e75daa1 [Feature] Optimize DeepSeek's DeepEP on Ascend NPU (#8355)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: Hexq0210 <hexq0809521@gmail.com>
2025-08-09 01:35:00 -07:00
blzheng
62f8eb48b1 [CPU] Fix fallback allgather issue (#8041) 2025-08-07 00:08:18 -07:00
Nicolas Castet
82e6c3a65a Add support for NCCL symmetric memory for TP allreduces (#8238) 2025-08-01 23:30:55 +00:00
Yineng Zhang
0ad098b494 Revert "Fix nan value generated after custom all reduce (#8532)" (#8642) 2025-07-31 17:26:49 -07:00
kk
4a6e7a66a0 Fix nan value generated after custom all reduce (#8532) 2025-07-31 16:15:43 -07:00
Cheng Wan
7a1f7fc504 [Feature] Hybrid EP and TP (#8590) 2025-07-31 02:53:25 -07:00
Stepan Kargaltsev
1b9cea5ade [P/D] Support ipv6 in P/D scenario (#7858)
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-07-25 08:53:30 -07:00
Cheng Wan
c0fb25e949 DP Enhancement (#8280) 2025-07-24 21:36:21 -07:00
li haoyang
28d4d47280 [Feature] Integrate quick allreduce and select the best allreduce implementation (#6619)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-24 20:48:42 -07:00
Sai Enduri
d0510f08fe Revert "Fix different device type adjustment in PP" (#8141) 2025-07-18 01:12:11 -07:00
Qiaolin Yu
3bc43c683e Fix different device type adjustment in PP (#7760) 2025-07-15 19:37:14 -07:00
ykcombat
d4d0c7c367 [Feature]TP Group Switching for PD-Multiplexing (#7653) 2025-07-15 02:35:46 +08:00
TianyuZhang1214
0099172327 feat: use D2D instead of H2H in pp (#7673)
Co-authored-by: alpha-baby <fujianhao1997@qq.com>
2025-07-03 10:58:50 -07:00
Chunyuan WU
8f844db699 [CPU] fix all_reduce and all_gather (#6770)
Co-authored-by: blzheng <beilei.zheng@intel.com>
2025-07-02 22:39:45 -07:00
Cheng Wan
8609e637a9 Fix All-Gather under world size one (#7219) 2025-06-20 14:57:34 -07:00
zyksir
8e3797be1c support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277) 2025-06-04 22:11:24 -07:00
Baizhou Zhang
bdaefbbfbd Add environment flag for disabling message queue broadcaster (#6403) 2025-05-26 22:32:41 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Hubert Lu
2a936a841e [AMD] switch to custom allreduce regardless of MSCCL setting on ROCm (#6097) 2025-05-08 13:46:58 -07:00
Baizhou Zhang
73600673bb Clean logs for DeepSeek-V3 launching (#6079) 2025-05-07 18:54:50 -07:00
Song Zhang
00c2c1f08b [Feature] Support for Ascend NPU backend (#3853)
Signed-off-by: Song Zhang <gepin.zs@antgroup.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
2025-05-06 20:32:53 -07:00
Adarsh Shirawalmath
683707c314 [Security][Bug] Prevent binding to all TCP interfaces (#5752) 2025-05-06 03:21:45 +08:00
Lianmin Zheng
9c088829ee Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (#5786) 2025-04-27 04:03:02 -07:00
Wenxuan Tan
dfb322642f Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (#5728) 2025-04-26 18:11:09 -07:00
Yi Zhang
aba5ca154d python transfer custom allreduce from trt kernel to vllm kernel (#5080) 2025-04-05 15:35:55 -07:00
JieXin Liang
a995a773a0 [fix] remove cuda_device_count_stateless (#5060) 2025-04-04 00:18:26 -07:00
Kebe
10a9ab7b07 Fix error due to CustomAllreduce setup failure (#4815)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-03-27 18:52:10 -07:00
tarinkk
7f19e083c1 Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
2025-03-27 17:09:35 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
Cheng Wan
3196999f63 Reduce computation and communication in DP attention (#4521) 2025-03-18 13:41:36 -07:00