[Bugfix] Fix in_profile_run in mtp_proposer dummy_run (#5165)

### What this PR does / why we need it?
This PR aims to fix failure of `enable_force_load_balance` caused by
missing `in_profile_run` in `dummy_run` of mtp_proposer.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
by ci

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: Zetong Li <slippersss@126.com>
This commit is contained in:
Zetong Li
2025-12-18 22:27:47 +08:00
committed by GitHub
parent 7d32371b7e
commit 2304218f90
5 changed files with 12 additions and 6 deletions

View File

@@ -2164,7 +2164,8 @@ class NPUModelRunner(GPUModelRunner):
aclgraph_runtime_mode=aclgraph_runtime_mode,
batch_descriptor=batch_descriptor,
dummy_compute_logits=dummy_drafter_compute_logits,
in_graph_capturing=not force_attention)
in_graph_capturing=not force_attention,
is_profile=is_profile)
if is_profile and self.dynamic_eplb:
self.model.clear_all_moe_loads()
if not is_profile and self.dynamic_eplb: