Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087)
### What this PR does / why we need it?
A new kv_role "kv_both" is added to run mixed deployment scenarios. The
mixed deployment will involve a decode phase, where with_prefill should
be false.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.10.2
- vLLM main:
c60e6137f0
Signed-off-by: fems14 <1804143737@qq.com>
This commit is contained in:
@@ -2406,7 +2406,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
dtype=np.int32)
|
||||
|
||||
# Force dummy run on prefill stage when this node is deemed as kv producer.
|
||||
if self.is_kv_producer:
|
||||
if self.is_kv_producer and not self.is_kv_consumer:
|
||||
with_prefill = True
|
||||
|
||||
attn_metadata = self._build_attention_metadata(
|
||||
|
||||
Reference in New Issue
Block a user