modify:Eliminate redundant operations in the code to improve performance (#137)

### What this PR does / why we need it?
Eliminate redundant operations in the code to improve performance

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
---------

Signed-off-by: Yaphets24 <d_mym0618@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Yaphets24
2025-02-22 17:43:42 +08:00
committed by GitHub
parent 202b39a38c
commit d0b3cb4fa7
4 changed files with 52 additions and 29 deletions

View File

@@ -1137,6 +1137,8 @@ class NPUModelRunner(NPUModelRunnerBase[ModelInputForNPUWithSamplingMetadata]):
if not bypass_model_exec:
with set_forward_context(model_input.attn_metadata,
self.vllm_config, virtual_engine):
if model_input.attn_metadata is not None:
model_input.attn_metadata.input_positions = model_input.input_positions
hidden_or_intermediate_states = model_executable(
input_ids=model_input.input_tokens,
positions=model_input.input_positions,