[Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437)

### What this PR does / why we need it? Fix ZeroDivisionError when prefill_tp_size > num_kv_head, in this situation, num_head_replica can be 0 and used to divide another value, this PR restricts the minimum value of a to be 1. And this PR fix tp_resharding README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com>
2025-10-15 08:45:44 +08:00
parent 02c26dcfc7
commit c2c1db78a7
4 changed files with 26 additions and 14 deletions
--- a/vllm_ascend/distributed/mooncake_layerwise_connector.py
+++ b/vllm_ascend/distributed/mooncake_layerwise_connector.py
@@ -360,7 +360,7 @@ class SendingLayerThread(threading.Thread):
        remote_kv_base_addrs = req_meta.kv_caches_base_addr

        remote_block_ids = req_meta.block_ids
-        if self.num_head_replica >= 1 and self.tp_rank % self.num_head_replica != 0:
+        if self.tp_rank % self.num_head_replica != 0:
            pass
        elif self.pd_head_ratio == 1:
            layer_local_kv_base_addr = [