[Aclgraph] Update compilation config in check_and_update_config (#2540)
### What this PR does / why we need it?
This pr updates compilation config in `check_and_update_config`, we use
`compilation_config.level` to update `compilation_config.cudagraph_mode`
to ensure the config is correct.
Add `compilation_config.cudagraph_num_of_warmups = 1` when V1 is
enabled, cause this is also used in torchair graph mode. and this fixes
https://github.com/vllm-project/vllm-ascend/issues/2523
fix the bug that the `aclgraphmode` always be `NONE` while running
forward in aclgraph mode
### How was this patch tested?
CI passed with new added/existing test.
- vLLM version: v0.10.1.1
- vLLM main:
f58675bfb3
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
@@ -1660,6 +1660,10 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
moe_comm_method = (self.moe_comm_method
|
||||
if num_input_tokens <= self.mc2_tokens_capacity else
|
||||
self.fallback_moe_comm_method)
|
||||
batch_descriptor = BatchDescriptor(num_tokens=num_input_tokens,
|
||||
uniform_decode=False)
|
||||
aclgraph_runtime_mode, batch_descriptor = \
|
||||
self.aclgraph_dispatcher.dispatch(batch_descriptor)
|
||||
|
||||
# Run forward pass
|
||||
with ProfileExecuteDuration().capture_async("forward"):
|
||||
@@ -1671,6 +1675,8 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
with_prefill=self.with_prefill,
|
||||
reserved_mc2_mask=self.reserved_mc2_mask,
|
||||
moe_comm_method=moe_comm_method,
|
||||
aclgraph_runtime_mode=aclgraph_runtime_mode,
|
||||
batch_descriptor=batch_descriptor,
|
||||
num_actual_tokens=scheduler_output.
|
||||
total_num_scheduled_tokens):
|
||||
self.maybe_setup_kv_connector(scheduler_output)
|
||||
|
||||
Reference in New Issue
Block a user