[Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348)

### What this PR does / why we need it?
This PR aims to fix padding logic in eagle proposer for kimi25. Main
changes involve:
1. modify the way to obtain draft model attention builder and backend
2. add block table padding & related tensor slicing in common metadata
when `draft_step>1` for solving fia verifying error
3. replace block table in `update_graph_params` for solving fia
verifying error

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: Zetong Li <slippersss@126.com>
This commit is contained in:
Zetong Li
2026-03-21 16:57:22 +08:00
committed by GitHub
parent f482c314cf
commit 84a74f0cb1
4 changed files with 51 additions and 29 deletions

View File

@@ -407,6 +407,7 @@ class TestEagleProposerDummyRun(TestBase):
mock_get_context.return_value = mock_return_context
mock_get_context_2.return_value = mock_return_context
self.proposer.use_cuda_graph = True
self.proposer.draft_attn_groups = [MagicMock()]
# cpu does not support `torch.ops.vllm.maybe_pad_and_reduce`
with set_current_vllm_config(self.vllm_config):
self.proposer.enable_shared_expert_dp = False