[bugfix] fix deepseek rope sincoscache re-generation (#2744)
### What this PR does / why we need it?
The current implementation will result in duplicate generation of
`sin_cos_cache` in rope when `kv_seqlen` > 4k, because the
initialization length of the `sin_cos_cache` is only 4k.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
After this PR merged, sin_cos_cache will not increase in forward func,
so `test_native_rope_deepseek_forward_cache_handling` is not necessary.
- vLLM version: v0.10.1.1
- vLLM main:
60f0843ef8
Signed-off-by: zzzzwwjj <1183291235@qq.com>
This commit is contained in:
@@ -1198,9 +1198,7 @@ class AscendMLATorchairImpl(MLAAttentionImpl):
|
||||
else:
|
||||
decode_q_pe[...], decode_k_pe[...] = self.rotary_emb(
|
||||
attn_metadata.decode.input_positions,
|
||||
decode_q_pe.contiguous(),
|
||||
decode_k_pe,
|
||||
max_seq_len=attn_metadata.decode.max_seq_lens)
|
||||
decode_q_pe.contiguous(), decode_k_pe)
|
||||
if has_prefill:
|
||||
assert attn_metadata.prefill is not None
|
||||
prefill_q = self.q_proj(prefill_hs_or_q_c)[0]\
|
||||
@@ -1225,9 +1223,7 @@ class AscendMLATorchairImpl(MLAAttentionImpl):
|
||||
else:
|
||||
prefill_q_pe[...], prefill_k_pe[...] = self.rotary_emb(
|
||||
attn_metadata.prefill.input_positions,
|
||||
prefill_q_pe.contiguous(),
|
||||
prefill_k_pe,
|
||||
max_seq_len=attn_metadata.prefill.max_seq_lens)
|
||||
prefill_q_pe.contiguous(), prefill_k_pe)
|
||||
|
||||
assert len(
|
||||
kv_cache
|
||||
|
||||
Reference in New Issue
Block a user