[Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348)
### What this PR does / why we need it?
This PR aims to fix padding logic in eagle proposer for kimi25. Main
changes involve:
1. modify the way to obtain draft model attention builder and backend
2. add block table padding & related tensor slicing in common metadata
when `draft_step>1` for solving fia verifying error
3. replace block table in `update_graph_params` for solving fia
verifying error
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
Signed-off-by: Zetong Li <slippersss@126.com>
This commit is contained in:
@@ -407,6 +407,7 @@ class TestEagleProposerDummyRun(TestBase):
|
||||
mock_get_context.return_value = mock_return_context
|
||||
mock_get_context_2.return_value = mock_return_context
|
||||
self.proposer.use_cuda_graph = True
|
||||
self.proposer.draft_attn_groups = [MagicMock()]
|
||||
# cpu does not support `torch.ops.vllm.maybe_pad_and_reduce`
|
||||
with set_current_vllm_config(self.vllm_config):
|
||||
self.proposer.enable_shared_expert_dp = False
|
||||
|
||||
Reference in New Issue
Block a user