[Perf] Avoid performing index selection of sin/cos cache every layer (#1890)

Optimize number of index selections of sin/cos cache.

- vLLM version: v0.10.0
- vLLM main:
656c24f1b5

Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
whx
2025-07-29 18:06:45 +08:00
committed by GitHub
parent 0190b68f51
commit 98cadc2146
3 changed files with 73 additions and 22 deletions

View File

@@ -1799,6 +1799,9 @@ class NPUModelRunner(LoRAModelRunnerMixin):
attn_metadata.decode.input_positions)
torch._dynamo.mark_static(
get_forward_context().mc2_mask)
if hasattr(attn_metadata.decode, "sin"):
torch._dynamo.mark_static(attn_metadata.decode.sin)
torch._dynamo.mark_static(attn_metadata.decode.cos)
torch._dynamo.mark_static(attn_metadata.slot_mapping)
for kv in self.kv_caches:
assert isinstance(