Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087)

### What this PR does / why we need it? A new kv_role "kv_both" is added to run mixed deployment scenarios. The mixed deployment will involve a decode phase, where with_prefill should be false. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: c60e6137f0 Signed-off-by: fems14 <1804143737@qq.com>
2025-09-22 20:36:41 +08:00
parent 37a0715eda
commit 1c9f0fe26f
3 changed files with 12 additions and 6 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -2406,7 +2406,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
                                        dtype=np.int32)

        # Force dummy run on prefill stage when this node is deemed as kv producer.
-        if self.is_kv_producer:
+        if self.is_kv_producer and not self.is_kv_consumer:
            with_prefill = True

        attn_metadata = self._build_attention_metadata(