[Aclgraph] Update compilation config in check_and_update_config (#2540)

### What this PR does / why we need it? This pr updates compilation config in `check_and_update_config`, we use `compilation_config.level` to update `compilation_config.cudagraph_mode` to ensure the config is correct. Add `compilation_config.cudagraph_num_of_warmups = 1` when V1 is enabled, cause this is also used in torchair graph mode. and this fixes https://github.com/vllm-project/vllm-ascend/issues/2523 fix the bug that the `aclgraphmode` always be `NONE` while running forward in aclgraph mode ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.10.1.1 - vLLM main: f58675bfb3 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-27 09:30:25 +08:00
parent f22077daa6
commit a9e78a3299
3 changed files with 118 additions and 34 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -1660,6 +1660,10 @@ class NPUModelRunner(LoRAModelRunnerMixin):
        moe_comm_method = (self.moe_comm_method
                           if num_input_tokens <= self.mc2_tokens_capacity else
                           self.fallback_moe_comm_method)
+        batch_descriptor = BatchDescriptor(num_tokens=num_input_tokens,
+                                           uniform_decode=False)
+        aclgraph_runtime_mode, batch_descriptor = \
+            self.aclgraph_dispatcher.dispatch(batch_descriptor)

        # Run forward pass
        with ProfileExecuteDuration().capture_async("forward"):
@@ -1671,6 +1675,8 @@ class NPUModelRunner(LoRAModelRunnerMixin):
                    with_prefill=self.with_prefill,
                    reserved_mc2_mask=self.reserved_mc2_mask,
                    moe_comm_method=moe_comm_method,
+                    aclgraph_runtime_mode=aclgraph_runtime_mode,
+                    batch_descriptor=batch_descriptor,
                    num_actual_tokens=scheduler_output.
                    total_num_scheduled_tokens):
                self.maybe_setup_kv_connector(scheduler_output)