[Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348)

### What this PR does / why we need it? This PR aims to fix padding logic in eagle proposer for kimi25. Main changes involve: 1. modify the way to obtain draft model attention builder and backend 2. add block table padding & related tensor slicing in common metadata when `draft_step>1` for solving fia verifying error 3. replace block table in `update_graph_params` for solving fia verifying error - vLLM version: v0.17.0 - vLLM main: 4034c3d32e Signed-off-by: Zetong Li <slippersss@126.com>
2026-03-21 16:57:22 +08:00
parent f482c314cf
commit 84a74f0cb1
4 changed files with 51 additions and 29 deletions
--- a/vllm_ascend/attention/attention_v1.py
+++ b/vllm_ascend/attention/attention_v1.py
@@ -495,10 +495,12 @@ class AscendAttentionBackendImpl(AttentionImpl):
                        draft_step = attn_count // num_layers
                        seq_lens = attn_metadata[draft_step][key].seq_lens_list
                        actual_seq_lengths_q = attn_metadata[draft_step][key].actual_seq_lengths_q
+                        block_tables = attn_metadata[draft_step][key].block_tables
                        attn_count = attn_count + 1
                    else:
                        seq_lens = attn_metadata[key].seq_lens_list
                        actual_seq_lengths_q = attn_metadata[key].actual_seq_lengths_q
+                        block_tables = attn_metadata[key].block_tables

                    torch.npu.graph_task_update_begin(update_stream, handle)
                    torch_npu.npu_fused_infer_attention_score.out(