[Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349)
### What this PR does / why we need it?
Overview: This pull request refactors speculative decoding for Eagle and
MTP proposers on Ascend hardware. It fixes a bug related to
draft_attn_metadatas being lost, migrates the lmhead feature, and adds
routing logic in MtpProposer.
Details:
1. Migrated the lmhead feature from mtp to eagle and normalized it in
eagle_proposer.
2. Fixed the bug where draft_attn_metadatas was lost after enabling
eagle mode in the merge graph.
3. Added the routing for pcp and disable padded drafter batch; in mtp
mode, if pcp and disable padded drafter batch are not enabled, the
normalized file eagle_proposer will be used.
RFC: https://github.com/vllm-project/vllm-ascend/issues/5467
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
ut and test
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8
---------
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
@@ -43,6 +43,7 @@ def set_ascend_forward_context(
|
||||
model_instance: torch.nn.Module = None,
|
||||
is_draft_model=False,
|
||||
skip_compiled: bool = False,
|
||||
draft_attn_metadatas=None,
|
||||
):
|
||||
"""A context manager that stores the current forward context,
|
||||
can be attention metadata, etc.
|
||||
@@ -61,6 +62,7 @@ def set_ascend_forward_context(
|
||||
|
||||
with set_forward_context(**forward_context_kwargs):
|
||||
forward_context = get_forward_context()
|
||||
forward_context.draft_attn_metadatas = draft_attn_metadatas
|
||||
|
||||
from vllm_ascend.ops.fused_moe.moe_comm_method import get_moe_comm_method
|
||||
|
||||
|
||||
Reference in New Issue
Block a user