[Feat] Merge the multi eagle graphs to one graph (#5940)

### What this PR does / why we need it? This PR merge all steps of draft model in fullgraph mode, to avoid the synchronize between each graph, reduce the bubble time. #### Key ideas: - The "model forward" of the step 0 (first step) and remaining steps are captured together as a "Callable", rather than capturing each model individually. - "update_attn_params" is moved outside the entire graph, meaning that all "attn_metadata" required by all steps are constructed before "replay", and the "attn_params" of all steps are updated at once. - Remove synchronization between the main model graph and draft model graph. #### Key params/functions: - params: draft_attn_metadatas, attn_metadata_multi_steps, slot_mapping_group - functions: _run_merged_draft, attn_update_stack_num_spec_norm, update_attn_params, _propose, dummy_run ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: 11b6af5280 Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
2026-01-23 08:37:02 +08:00
parent 63d3921208
commit 7725314b26
5 changed files with 396 additions and 218 deletions
--- a/tests/ut/compilation/test_acl_graph.py
+++ b/tests/ut/compilation/test_acl_graph.py
@@ -295,6 +295,7 @@ class TestACLGraphWrapper(TestBase):
        mock_current_platform.get_global_graph_pool.return_value = self.mock_graph_pool
        mock_get_forward_context.return_value = self.mock_forward_context
        self.mock_forward_context.cudagraph_runtime_mode = CUDAGraphMode.FULL
+        self.mock_forward_context.is_draft_model = False

        # Mock torch.npu.NPUGraph
        mock_npu_graph = MagicMock()
@@ -366,6 +367,7 @@ class TestACLGraphWrapper(TestBase):
        mock_current_platform.get_global_graph_pool.return_value = self.mock_graph_pool
        mock_get_forward_context.return_value = self.mock_forward_context
        self.mock_forward_context.cudagraph_runtime_mode = CUDAGraphMode.FULL
+        self.mock_forward_context.is_draft_model = False

        # Mock torch.npu.NPUGraph
        mock_npu_graph = MagicMock()