[Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811)

### What this PR does / why we need it? [Refactor][EAGLE] 7/N Merged PCP and disable_padded interface into eagle_proposer.py This pull request significantly refactors the speculative decoding mechanism by merging Parallel Context Processing (PCP) and Multi-Token Prediction (MTP) functionalities directly into the eagle_proposer.py. The changes aim to enhance the efficiency and correctness of distributed speculative decoding, particularly by enabling the Eagle feature to work seamlessly with the disable_padded interface. This involves detailed adjustments to attention metadata, input/output processing, and state management to ensure proper operation in parallel environments. 1. The PCP and MTP features are migrated to the eagle_proposer.py 2. The Eagle and PCP features are integrated 3. Enable the eagle feature to use the disable_padded interface ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests and UT - vLLM version: v0.15.0 - vLLM main: 83b47f67b1 --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2026-02-27 16:06:56 +08:00
parent e4458b2d2b
commit c13d90b766
6 changed files with 245 additions and 60 deletions
--- a/vllm_ascend/spec_decode/mtp_proposer.py
+++ b/vllm_ascend/spec_decode/mtp_proposer.py
@@ -39,11 +39,7 @@ class MtpProposer(EagleProposer):
        # Currently, both GLM and DS encounter issues when enabling the fullgraph mode and running on EagleProposer.
        # Therefore, we temporarily bypass this problem by adding a conditional check for fullgraph.
        # TODO: this conditional check should be removed after bug fixing.
-        if (
-            self.pcp_size * self.dcp_size == 1
-            and not self.speculative_config.disable_padded_drafter_batch
-            and not self.vllm_config.compilation_config.cudagraph_mode.has_full_cudagraphs()
-        ):
+        if not self.vllm_config.compilation_config.cudagraph_mode.has_full_cudagraphs():
            super().dummy_run(
                num_tokens,
                with_prefill,
@@ -175,11 +171,7 @@ class MtpProposer(EagleProposer):
        # Currently, both GLM and DS encounter issues when enabling the fullgraph mode and running on EagleProposer.
        # Therefore, we temporarily bypass this problem by adding a conditional check for fullgraph.
        # TODO: this conditional check should be removed after bug fixing.
-        if (
-            self.pcp_size * self.dcp_size == 1
-            and not self.speculative_config.disable_padded_drafter_batch
-            and not self.vllm_config.compilation_config.cudagraph_mode.has_full_cudagraphs()
-        ):
+        if not self.vllm_config.compilation_config.cudagraph_mode.has_full_cudagraphs():
            draft_token_ids = super()._propose(
                target_token_ids,
                target_positions,