[v0.18.0] Apply Eagle3 to MiniMax-M2.5 (#7619) (#7714)

### What this PR does / why we need it? Apply Eagle3 to MiniMax-M2.5 to increase model performance This will be discard after Eagle3 weight for MiniMax-M2.5 releases and code change accepted by official repo https://github.com/vllm-project/vllm/pull/37512/changes backport: #7619 - vLLM version: v0.18.0 - vLLM main: ed359c497a Signed-off-by: limuyuan <limuyuan3@huawei.com> Co-authored-by: limuyuan <limuyuan3@huawei.com>
2026-03-27 18:33:29 +08:00
parent 60e88d9541
commit 6fbd0049df
3 changed files with 312 additions and 24 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -137,6 +137,38 @@
 #       Remove this patch if upstream provides an official NPU graph-capture
 #       guidance / auto-configuration path for HCCL.
 #
+#   3. `vllm.config.speculative.SpeculativeConfig._verify_args`
+#    Why:
+#       Upstream vLLM's eagle3/extract_hidden_states restricts target model types
+#       via a whitelist. MiniMax-M2 should be allowed once the worker-side model
+#       can emit auxiliary hidden states.
+#    How：
+#       Monkey-patch `_verify_args` to bypass only the whitelist ValueError for
+#       MiniMax model_type when method is eagle3/extract_hidden_states.
+#       SpeculativeConfig is a Pydantic dataclass (`@config`); init validation calls
+#       `__pydantic_decorators__.model_validators["_verify_args"].func`, so that
+#       `Decorator.func` must be replaced (not only `SpeculativeConfig._verify_args`),
+#       then `rebuild_dataclass(SpeculativeConfig, force=True)`.
+#       If `VllmConfig` was imported earlier, also `rebuild_dataclass(VllmConfig, ...)`
+#       so nested `speculative_config` validation does not use a stale schema.
+#    Related PR (if no, explain why):
+#       https://github.com/vllm-project/vllm/pull/37512
+#    Future Plan:
+#       Remove this patch once upstream whitelist includes MiniMax.
+#
+#   4. `vllm.model_executor.models.registry` (spec decode aliases)
+#    Why:
+#       Some Eagle3 draft checkpoints may declare a MiniMax-specific architecture
+#       string while reusing the shared Eagle3 implementation.
+#    How：
+#       Register `Eagle3MiniMaxM2ForCausalLM` as an alias pointing to the
+#       existing Eagle3 implementation in the speculative decoding registry.
+#    Related PR (if no, explain why):
+#       https://github.com/vllm-project/vllm/pull/37512
+#    Future Plan:
+#       Drop the alias once upstream registry includes it or the checkpoint
+#       standardizes architecture strings.
+#
 # ** 8. File: platform/patch_kv_cache_interface.py**
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #   1. `vllm.v1.kv_cache_interface.MLAAttentionSpec`
@@ -453,6 +485,31 @@
 #    Future Plan:
 #       Remove this patch when upstream supports MiniMax-M2 fp8 loading on NPU.
 #
+#   4. `vllm.model_executor.models.minimax_m2.MiniMaxM2Model.forward`
+#    Why:
+#       Eagle3 speculative decoding needs auxiliary hidden states from specific
+#       transformer layers of the target model.
+#    How：
+#       Extend `MiniMaxM2Model.forward` to optionally collect and return
+#       `(final_hidden_states, aux_hidden_states)` when `aux_hidden_state_layers`
+#       is set by the runtime.
+#    Related PR (if no, explain why):
+#       https://github.com/vllm-project/vllm/pull/37512
+#    Future Plan:
+#       Remove this patch once upstream MiniMax-M2 integrates Eagle3 support.
+#
+#   5. `vllm.model_executor.models.minimax_m2.MiniMaxM2ForCausalLM`
+#    Why:
+#       vLLM core uses SupportsEagle3-style methods to configure which layers
+#       should emit auxiliary hidden states.
+#    How：
+#       Inject `set_aux_hidden_state_layers` and default-layer getters onto
+#       `MiniMaxM2ForCausalLM` so vLLM can configure the target model.
+#    Related PR (if no, explain why):
+#       https://github.com/vllm-project/vllm/pull/37512
+#    Future Plan:
+#       Remove this patch once upstream provides these methods on the model.
+#
 # ** 18. File: worker/patch_minimax_m2_linear_attn.py**
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #   1. `vllm.model_executor.layers.mamba.linear_attn.MiniMaxText01RMSNormTP.__init__`