[Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349)

### What this PR does / why we need it? Overview: This pull request refactors speculative decoding for Eagle and MTP proposers on Ascend hardware. It fixes a bug related to draft_attn_metadatas being lost, migrates the lmhead feature, and adds routing logic in MtpProposer. Details: 1. Migrated the lmhead feature from mtp to eagle and normalized it in eagle_proposer. 2. Fixed the bug where draft_attn_metadatas was lost after enabling eagle mode in the merge graph. 3. Added the routing for pcp and disable padded drafter batch; in mtp mode, if pcp and disable padded drafter batch are not enabled, the normalized file eagle_proposer will be used. RFC: https://github.com/vllm-project/vllm-ascend/issues/5467 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut and test - vLLM version: v0.14.1 - vLLM main: dc917cceb8 --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2026-02-02 19:15:31 +08:00
parent c08364f761
commit 7932255c06
4 changed files with 90 additions and 24 deletions
--- a/vllm_ascend/ascend_forward_context.py
+++ b/vllm_ascend/ascend_forward_context.py
@@ -43,6 +43,7 @@ def set_ascend_forward_context(
    model_instance: torch.nn.Module = None,
    is_draft_model=False,
    skip_compiled: bool = False,
+    draft_attn_metadatas=None,
 ):
    """A context manager that stores the current forward context,
    can be attention metadata, etc.
@@ -61,6 +62,7 @@ def set_ascend_forward_context(

    with set_forward_context(**forward_context_kwargs):
        forward_context = get_forward_context()
+        forward_context.draft_attn_metadatas = draft_attn_metadatas

        from vllm_ascend.ops.fused_moe.moe_comm_method import get_moe_comm_method