[Bugfix] Fix incorrect layer count for MTP models in update_aclgraph_sizes (#7064)

## Summary - Fix incorrect layer count calculation for MTP (Multi-Token Prediction) models in `update_aclgraph_sizes()` function - For MTP models, the draft model's layer count is stored in `num_nextn_predict_layers` or `mtp_num_hidden_layers` (for Qwen3.5), not in the standard `num_hidden_layers` field - Directly accessing `draft.hf_config.num_hidden_layers` returns the main model's layer count instead of the MTP draft model's layer count ## Bug Description In `vllm_ascend/utils.py`, the `update_aclgraph_sizes()` function calculates `resources_per_graph` for speculative decoding scenarios. When calculating the resources needed for the draft model, the original code directly accessed: ```python resources_per_graph += draft.hf_config.num_hidden_layers + 1 ``` This works correctly for standard draft models, but **fails for MTP models** (like DeepSeek-V3's MTP or Qwen3.5's MTP) because: 1. MTP models store their layer count in model-specific fields: - `num_nextn_predict_layers` (DeepSeek-V3 MTP) - `mtp_num_hidden_layers` (Qwen3.5 MTP) 2. The `num_hidden_layers` field in these models contains the **main model's** layer count, not the MTP layer count 3. This leads to **grossly overestimating** the `resources_per_graph`, which in turn causes the calculated `max_batch_sizes` to be unnecessarily small ## Fix Use `draft.get_total_num_hidden_layers()` instead of directly accessing `draft.hf_config.num_hidden_layers`. This method correctly handles different model types through the `model_arch_config_convertor` infrastructure, returning the appropriate layer count for: - Standard draft models → `num_hidden_layers` - DeepSeek-V3 MTP → `num_nextn_predict_layers` - Qwen3.5 MTP → `mtp_num_hidden_layers` 🤖 Generated with [Claude Code](https://claude.com/claude-code) - vLLM version: v0.16.0 - vLLM main: 4034c3d32e Signed-off-by: wanghuanjun2113 <wanghuanjun2113@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 16:14:51 +08:00
parent 4b4961ba5f
commit dec04ec8d8
1 changed files with 4 additions and 1 deletions
--- a/vllm_ascend/utils.py
+++ b/vllm_ascend/utils.py
@@ -485,7 +485,10 @@ def update_aclgraph_sizes(vllm_config: VllmConfig) -> None:
    resources_per_graph = num_hidden_layers + 1
    # For suffix decoding, use the suffix path when no draft_model_config is provided.
    if (spec := vllm_config.speculative_config) and (draft := spec.draft_model_config):
-        resources_per_graph += draft.hf_config.num_hidden_layers + 1
+        # Use get_total_num_hidden_layers() to correctly handle MTP models,
+        # which store layer count in num_nextn_predict_layers or
+        # mtp_num_hidden_layers (for Qwen3.5) instead of num_hidden_layers.
+        resources_per_graph += draft.get_total_num_hidden_layers() + 1

    # TODO: Find out whether we need to take into account the pp_size
    num_comm_groups = sum(