[Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349)

### What this PR does / why we need it?

Overview: This pull request refactors speculative decoding for Eagle and
MTP proposers on Ascend hardware. It fixes a bug related to
draft_attn_metadatas being lost, migrates the lmhead feature, and adds
routing logic in MtpProposer.

Details:
1. Migrated the lmhead feature from mtp to eagle and normalized it in
eagle_proposer.
2. Fixed the bug where draft_attn_metadatas was lost after enabling
eagle mode in the merge graph.
3. Added the routing for pcp and disable padded drafter batch; in mtp
mode, if pcp and disable padded drafter batch are not enabled, the
normalized file eagle_proposer will be used.

RFC: https://github.com/vllm-project/vllm-ascend/issues/5467

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
ut and test

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
lilinsiman
2026-02-02 19:15:31 +08:00
committed by GitHub
parent c08364f761
commit 7932255c06
4 changed files with 90 additions and 24 deletions

View File

@@ -43,6 +43,7 @@ def set_ascend_forward_context(
model_instance: torch.nn.Module = None,
is_draft_model=False,
skip_compiled: bool = False,
draft_attn_metadatas=None,
):
"""A context manager that stores the current forward context,
can be attention metadata, etc.
@@ -61,6 +62,7 @@ def set_ascend_forward_context(
with set_forward_context(**forward_context_kwargs):
forward_context = get_forward_context()
forward_context.draft_attn_metadatas = draft_attn_metadatas
from vllm_ascend.ops.fused_moe.moe_comm_method import get_moe_comm_method