Commit Graph

25 Commits

Author SHA1 Message Date
Sai Enduri
d0510f08fe Revert "Fix different device type adjustment in PP" (#8141) 2025-07-18 01:12:11 -07:00
Qiaolin Yu
3bc43c683e Fix different device type adjustment in PP (#7760) 2025-07-15 19:37:14 -07:00
ykcombat
d4d0c7c367 [Feature]TP Group Switching for PD-Multiplexing (#7653) 2025-07-15 02:35:46 +08:00
TianyuZhang1214
0099172327 feat: use D2D instead of H2H in pp (#7673)
Co-authored-by: alpha-baby <fujianhao1997@qq.com>
2025-07-03 10:58:50 -07:00
Chunyuan WU
8f844db699 [CPU] fix all_reduce and all_gather (#6770)
Co-authored-by: blzheng <beilei.zheng@intel.com>
2025-07-02 22:39:45 -07:00
Cheng Wan
8609e637a9 Fix All-Gather under world size one (#7219) 2025-06-20 14:57:34 -07:00
zyksir
8e3797be1c support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277) 2025-06-04 22:11:24 -07:00
Baizhou Zhang
bdaefbbfbd Add environment flag for disabling message queue broadcaster (#6403) 2025-05-26 22:32:41 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Song Zhang
00c2c1f08b [Feature] Support for Ascend NPU backend (#3853)
Signed-off-by: Song Zhang <gepin.zs@antgroup.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
2025-05-06 20:32:53 -07:00
Lianmin Zheng
9c088829ee Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (#5786) 2025-04-27 04:03:02 -07:00
Wenxuan Tan
dfb322642f Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (#5728) 2025-04-26 18:11:09 -07:00
Kebe
10a9ab7b07 Fix error due to CustomAllreduce setup failure (#4815)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-03-27 18:52:10 -07:00
tarinkk
7f19e083c1 Support (1 <= dp < tp) in the dp attention in DeepEP (#4770)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
2025-03-27 17:09:35 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
Cheng Wan
3196999f63 Reduce computation and communication in DP attention (#4521) 2025-03-18 13:41:36 -07:00
Chen Shengzhi
f1cf6eefbe [Fix] Check the device backend before calling empty_cache function (#4212) 2025-03-12 21:28:48 -07:00
Ke Bao
00ce7e311c Fix all gather torch compile (#3992)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-03-02 00:41:38 -08:00
Nicolas Castet
127998cc41 Fix allgather ops inside cuda graphs (#3709) 2025-02-25 08:39:10 -08:00
Shenggui Li
c0bb9eb3b3 [improve] made timeout configurable (#3803) 2025-02-25 00:26:08 -08:00
Lianmin Zheng
ea535dc574 Revert "disable custom allreduce on HIP" (#3067) 2025-01-22 21:33:35 -08:00
Hui Liu
ddc2001fb0 disable custom allreduce on HIP (#3058) 2025-01-22 13:57:22 -08:00
Lianmin Zheng
73401fd016 Sync distributed package from vllm 0.6.4.post1 (#3010) 2025-01-20 04:57:14 -08:00
yizhang2077
d5b95cbb53 adapt vllm distributed module to sglang (#2244)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-01 15:54:52 +08:00