[1/N][Eagle3] Aligns auxiliary hidden state usage for eagle3 models (#5162)

### What this PR does / why we need it?
This is to prepare for the migration to vLLM's `EagleProposer`, it does
not have `name` attribution. Also it's a breakdown of #5100 .

Introduces logic to determine whether eagle3 heads require auxiliary
hidden states based on configuration, ensuring consistent handling
across related components. Prevents incorrect assumptions for eagle3
variants that do not use auxiliary outputs, improving compatibility and
correctness.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
Yizhou
2025-12-22 15:24:54 +08:00
committed by GitHub
parent b62b2ebd9b
commit 60d9398f6d
2 changed files with 33 additions and 4 deletions

View File

@@ -45,6 +45,9 @@ class EagleProposer(Proposer):
self.vllm_config = vllm_config
self.device = device
self.runner = runner
self.speculative_config = vllm_config.speculative_config
self.draft_model_config = self.speculative_config.draft_model_config
self.method = self.speculative_config.method
self.block_size = vllm_config.cache_config.block_size
# We need to get the hidden size from the draft model config because
@@ -99,6 +102,29 @@ class EagleProposer(Proposer):
device="cpu",
dtype=torch.int32)
self.attn_mask_builder = AttentionMaskBuilder(self.device)
self.eagle3_use_aux_hidden_state: bool = (
self._get_eagle3_use_aux_hidden_state_from_config())
def _get_eagle3_use_aux_hidden_state_from_config(self) -> bool:
"""
NOTE(2025-12-18): This is an explicit copy from vLLM EagleProposer, only added
to align with its logics.
Some eagle3 heads (e.g., nvidia/gpt-oss-120b-Eagle3-v2) do not use auxiliary
hidden states and directly uses the last layer output just like eagle1.
They might indicate this by setting "use_aux_hidden_state" to False
inside the "eagle_config" dict of their hf_config.
"""
if self.method != "eagle3":
return False
# Assume that eagle3 heads use aux hidden states by default
use_aux_hidden_state = True
eagle_config = getattr(self.draft_model_config.hf_config,
"eagle_config", None)
if eagle_config is not None:
use_aux_hidden_state = eagle_config.get("use_aux_hidden_state",
True)
return use_aux_hidden_state
def load_model(self, model: nn.Module) -> None:
target_attn_layer_names = set(