[Fix] Adjust use_aclgraph logic (#2156)

### What this PR does / why we need it? Updates the FusedMoE method to determine whether to use ACL Graph based on the `torchair_graph_config` This is equivalent to #2154 on v0.9.1-dev. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None needed. - vLLM version: v0.10.0 - vLLM main: ad57f23f6a Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-08-04 15:23:20 +08:00
parent 688350a3bb
commit a9480d5f0a
2 changed files with 11 additions and 2 deletions
--- a/vllm_ascend/ops/fused_moe.py
+++ b/vllm_ascend/ops/fused_moe.py
@@ -1105,7 +1105,7 @@ class AscendUnquantizedFusedMoEMethod(UnquantizedFusedMoEMethod):
        # this is a naive implementation for experts load balance so as
        # to avoid accumulating too much tokens on a single rank.
        # currently it is only activated when doing profile runs.
-        if enable_force_load_balance:
+        if enable_force_load_balance and not self.use_aclgraph:
            topk_ids = torch.randint_like(topk_ids, 0, global_num_experts)

        fused_moe_state = get_forward_context().fused_moe_state