[Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349)

### What this PR does / why we need it? Overview: This pull request refactors speculative decoding for Eagle and MTP proposers on Ascend hardware. It fixes a bug related to draft_attn_metadatas being lost, migrates the lmhead feature, and adds routing logic in MtpProposer. Details: 1. Migrated the lmhead feature from mtp to eagle and normalized it in eagle_proposer. 2. Fixed the bug where draft_attn_metadatas was lost after enabling eagle mode in the merge graph. 3. Added the routing for pcp and disable padded drafter batch; in mtp mode, if pcp and disable padded drafter batch are not enabled, the normalized file eagle_proposer will be used. RFC: https://github.com/vllm-project/vllm-ascend/issues/5467 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut and test - vLLM version: v0.14.1 - vLLM main: dc917cceb8 --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2026-02-02 19:15:31 +08:00
parent c08364f761
commit 7932255c06
4 changed files with 90 additions and 24 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -113,13 +113,10 @@ from vllm_ascend.spec_decode.eagle_proposer import EagleProposer
 from vllm_ascend.spec_decode.medusa_proposer import MedusaProposer
 from vllm_ascend.spec_decode.mtp_proposer import MtpProposer
 from vllm_ascend.utils import (
-    AscendDeviceType,
    enable_sp,
-    get_ascend_device_type,
    is_drafter_moe_model,
    is_moe_model,
    lmhead_tp_enable,
-    maybe_trans_nz,
    set_weight_prefetch_method,
 )
 from vllm_ascend.worker.npu_input_batch import NPUInputBatch
@@ -140,7 +137,6 @@ if TYPE_CHECKING:
 else:
    xgr = LazyLoader("xgr", globals(), "xgrammar")

-import torch_npu

 # if true, allow tensor initialization and casting with internal format (e.g., NZ)
 torch.npu.config.allow_internal_format = True