[Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348)

### What this PR does / why we need it?
This PR aims to fix padding logic in eagle proposer for kimi25. Main
changes involve:
1. modify the way to obtain draft model attention builder and backend
2. add block table padding & related tensor slicing in common metadata
when `draft_step>1` for solving fia verifying error
3. replace block table in `update_graph_params` for solving fia
verifying error

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: Zetong Li <slippersss@126.com>
This commit is contained in:
Zetong Li
2026-03-21 16:57:22 +08:00
committed by GitHub
parent f482c314cf
commit 84a74f0cb1
4 changed files with 51 additions and 29 deletions

View File

@@ -495,10 +495,12 @@ class AscendAttentionBackendImpl(AttentionImpl):
draft_step = attn_count // num_layers
seq_lens = attn_metadata[draft_step][key].seq_lens_list
actual_seq_lengths_q = attn_metadata[draft_step][key].actual_seq_lengths_q
block_tables = attn_metadata[draft_step][key].block_tables
attn_count = attn_count + 1
else:
seq_lens = attn_metadata[key].seq_lens_list
actual_seq_lengths_q = attn_metadata[key].actual_seq_lengths_q
block_tables = attn_metadata[key].block_tables
torch.npu.graph_task_update_begin(update_stream, handle)
torch_npu.npu_fused_infer_attention_score.out(