[Refactor] Unify full-graph parameter update logic (#6041)

### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
2026-01-24 20:12:57 +08:00
parent 8129c429ef
commit 8966a99710
10 changed files with 420 additions and 415 deletions
--- a/tests/ut/spec_decode/test_eagle_proposer.py
+++ b/tests/ut/spec_decode/test_eagle_proposer.py
@@ -333,11 +333,11 @@ class TestEagleProposerDummyRun(TestBase):
        self.proposer.dummy_run(num_tokens=64, with_prefill=True, num_reqs=4)
        self.assertTrue(self.proposer._runnable.call_count == 1)

-    @patch("vllm_ascend.spec_decode.eagle_proposer.update_attn_params")
+    @patch("vllm_ascend.spec_decode.eagle_proposer.update_full_graph_params")
    @patch("vllm_ascend.spec_decode.eagle_proposer.get_forward_context")
    @patch("vllm_ascend.spec_decode.eagle_proposer.set_ascend_forward_context")
    def test_dummy_run_in_graph_capture(self, mock_context, mock_get_context,
-                                        mock_update_attn_params):
+                                        mock_update_full_graph_params):
        last_use_cuda_graph = self.proposer.use_cuda_graph
        mock_return_context = MagicMock()
        mock_return_context.cudagraph_runtime_mode = CUDAGraphMode.FULL
@@ -352,14 +352,14 @@ class TestEagleProposerDummyRun(TestBase):
                                in_graph_capturing=True,
                                aclgraph_runtime_mode=CUDAGraphMode.FULL)
        self.assertTrue(self.proposer._runnable.call_count == 1)
-        mock_update_attn_params.assert_not_called()
+        mock_update_full_graph_params.assert_not_called()
        self.proposer.use_cuda_graph = last_use_cuda_graph

-    @patch("vllm_ascend.spec_decode.eagle_proposer.update_attn_params")
+    @patch("vllm_ascend.spec_decode.eagle_proposer.update_full_graph_params")
    @patch("vllm_ascend.spec_decode.eagle_proposer.get_forward_context")
    @patch("vllm_ascend.spec_decode.eagle_proposer.set_ascend_forward_context")
    def test_dummy_run_in_graph_run(self, mock_context, mock_get_context,
-                                    mock_update_attn_params):
+                                    mock_update_full_graph_params):
        last_use_cuda_graph = self.proposer.use_cuda_graph
        mock_return_context = MagicMock()
        mock_return_context.cudagraph_runtime_mode = CUDAGraphMode.FULL
@@ -374,7 +374,7 @@ class TestEagleProposerDummyRun(TestBase):
                                in_graph_capturing=False,
                                aclgraph_runtime_mode=CUDAGraphMode.FULL)
        self.assertTrue(self.proposer._runnable.call_count == 1)
-        self.assertTrue(mock_update_attn_params.call_count == 1)
+        self.assertTrue(mock_update_full_graph_params.call_count == 1)
        self.proposer.use_cuda_graph = last_use_cuda_graph