【main】【bugfix】fix: restrict default MLAPO activation to Decode nodes only (#6451)

### What this PR does / why we need it? There is an issue with the current default logic for MLAPO (MLA Policy Optimization). By design, MLAPO should only be enabled by default on Decode (D) nodes. However, in hybrid (collocated prefill and decode) scenarios, the strategy is erroneously activated during the Prefill stage. This PR corrects the default behavior to ensure that MLAPO is exclusively enabled for the Decoding phase. This prevents unexpected policy interference during Prefill and ensures optimal performance in hybrid deployment environments. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: dc917cceb8 --------- Signed-off-by: fems14 <1804143737@qq.com>
2026-01-31 22:44:56 +08:00
parent ef02d20086
commit 775fbc4cd2
2 changed files with 8 additions and 4 deletions
--- a/vllm_ascend/attention/mla_v1.py
+++ b/vllm_ascend/attention/mla_v1.py
@@ -22,7 +22,7 @@ from vllm_ascend.attention.utils import (
    AscendCommonAttentionMetadata,
    ascend_chunked_prefill_workspace_size,
    enable_cp,
-    enabling_malpo,
+    enabling_mlapo,
    maybe_save_kv_layer_to_connector,
    split_decodes_and_prefills,
    trans_rope_weight,
@@ -710,7 +710,7 @@ class AscendMLAImpl(MLAAttentionImpl):
        self.ring_mla_mask_size = 512

        self.speculative_config = self.vllm_config.speculative_config
-        self.enable_mlapo = enabling_malpo(self.vllm_config)
+        self.enable_mlapo = enabling_mlapo(self.vllm_config)

        self.is_kv_producer = (
            self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.is_kv_producer