[bugfix] layerwise D first plan (#3866)

### What this PR does / why we need it? Refactored the layerwise code to send to the D node first, preventing P-node hangs due to communication timeouts when DP > 1. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0 - vLLM main: 83f478bb19 --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>
2025-10-30 22:20:34 +08:00
parent 627f20ce26
commit 2c291bc63f
4 changed files with 963 additions and 1354 deletions
--- a/vllm_ascend/distributed/parallel_state.py
+++ b/vllm_ascend/distributed/parallel_state.py
@@ -96,7 +96,8 @@ def init_ascend_model_parallel(parallel_config: ParallelConfig, ):
                parallel_config.data_parallel_size, num_head_replica, -1,
                alltoall_group_size
            )  # [DP_size, num_head_replica, num_alltoall_group, alltoall_group_size]
-            group_ranks = group_ranks.view(-1, alltoall_group_size).unbind(0)
+            group_ranks = group_ranks.reshape(-1,
+                                              alltoall_group_size).unbind(0)
        group_ranks = [x.tolist() for x in group_ranks]
        local_rank = get_world_group().local_rank
        num = next(