[Bugfix]fix bug when graph_size is not divisible by tp_size (#2719)

### What this PR does / why we need it? fix https://github.com/vllm-project/vllm-ascend/issues/2702 - A2: skip graph_size update that makes it to tp_size because dispatch/combine op support different batch size across EP ranks - A3: add `max_num_reqs = max(new_graph_batch_sizes)` to fix graph_size and max_num_reqs mismatch ### Does this PR introduce _any_ user-facing change? Nope ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: e599e2c65e --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-09-08 14:52:33 +08:00
parent dd087effcc
commit d3c3538ddc
2 changed files with 40 additions and 24 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -1905,7 +1905,6 @@ class NPUModelRunner(LoRAModelRunnerMixin):
        max_query_len = self.uniform_decode_query_len if uniform_decode else \
                                                                num_tokens

-        max_num_reqs = self.scheduler_config.max_num_seqs
        # Set num_scheduled_tokens based on num_tokens and max_num_seqs
        # for dummy run with LoRA so that the num_reqs collectively
        # has num_tokens in total.