[Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349)

### What this PR does / why we need it?

Overview: This pull request refactors speculative decoding for Eagle and
MTP proposers on Ascend hardware. It fixes a bug related to
draft_attn_metadatas being lost, migrates the lmhead feature, and adds
routing logic in MtpProposer.

Details:
1. Migrated the lmhead feature from mtp to eagle and normalized it in
eagle_proposer.
2. Fixed the bug where draft_attn_metadatas was lost after enabling
eagle mode in the merge graph.
3. Added the routing for pcp and disable padded drafter batch; in mtp
mode, if pcp and disable padded drafter batch are not enabled, the
normalized file eagle_proposer will be used.

RFC: https://github.com/vllm-project/vllm-ascend/issues/5467

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
ut and test

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
lilinsiman
2026-02-02 19:15:31 +08:00
committed by GitHub
parent c08364f761
commit 7932255c06
4 changed files with 90 additions and 24 deletions

View File

@@ -113,13 +113,10 @@ from vllm_ascend.spec_decode.eagle_proposer import EagleProposer
from vllm_ascend.spec_decode.medusa_proposer import MedusaProposer
from vllm_ascend.spec_decode.mtp_proposer import MtpProposer
from vllm_ascend.utils import (
AscendDeviceType,
enable_sp,
get_ascend_device_type,
is_drafter_moe_model,
is_moe_model,
lmhead_tp_enable,
maybe_trans_nz,
set_weight_prefetch_method,
)
from vllm_ascend.worker.npu_input_batch import NPUInputBatch
@@ -140,7 +137,6 @@ if TYPE_CHECKING:
else:
xgr = LazyLoader("xgr", globals(), "xgrammar")
import torch_npu
# if true, allow tensor initialization and casting with internal format (e.g., NZ)
torch.npu.config.allow_internal_format = True