[v0.11.0-dev][bugfix] Add branch for stream up-lifting in update_attn_params (#4437)

### What this PR does / why we need it?
#3985 move stream context initialization before for-loops to improve
performance. However, we find that this might cause potential accuracy
drop when used with pd disaggregation. Thus we partly revert this change
when using pd disaggregation, and we shall fix this bug in th future.

### Does this PR introduce _any_ user-facing change?
No.


---------

Signed-off-by: Angazenn <supperccell@163.com>
This commit is contained in:
Angazenn
2025-12-08 08:54:46 +08:00
committed by GitHub
parent 2598124e67
commit 6391f0625f
2 changed files with 80 additions and 23 deletions

View File

@@ -1598,7 +1598,8 @@ class NPUModelRunner(LoRAModelRunnerMixin):
self.speculative_config)
else:
update_attn_params(self.update_stream, forward_context,
maybe_padded_num_tokens)
maybe_padded_num_tokens,
self.vllm_config.kv_transfer_config)
if get_forward_context().sp_enabled:
hidden_states = tensor_model_parallel_all_gather(hidden_states, 0)
@@ -2359,7 +2360,8 @@ class NPUModelRunner(LoRAModelRunnerMixin):
num_tokens, self.speculative_config)
else:
update_attn_params(self.update_stream, forward_context,
num_tokens)
num_tokens,
self.vllm_config.kv_transfer_config)
if self.drafter and self.drafter.name == SpecDcodeType.EAGLE3:
hidden_states, _ = hidden_states