[Bugfix] Fix setting of speculative_config.enforce_eager for dsv32 (#5945)

### What this PR does / why we need it? This PR aims to fix setting of `speculative_config.enforce_eager` in deepseek v3.2 mtp. The point is that, vllm sets `speculative_config.enforce_eager` as True if using deepseek_v32 with mtp. Since we support graph mode, we simply ignore it here. However, this fix will also implicitly ignore user setting of `speculative_config.enforce_eager`, we need to take care and remove it once vllm supports this feature. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? by ci - vLLM version: v0.13.0 - vLLM main: 2c24bc6996 Signed-off-by: Zetong Li <slippersss@126.com>
2026-01-21 09:24:33 +08:00
parent 936d81a258
commit 1ab6cd4935
2 changed files with 16 additions and 1 deletions
--- a/vllm_ascend/spec_decode/mtp_proposer.py
+++ b/vllm_ascend/spec_decode/mtp_proposer.py
@@ -246,7 +246,7 @@ class MtpProposer(EagleProposer):
                -1]:
            num_input_tokens = self.vllm_config.pad_for_cudagraph(
                num_scheduled_tokens)
-        elif self.use_aclgraph  and num_tokens <= self.runner.cudagraph_batch_sizes[
+        elif self.use_aclgraph and num_tokens <= self.runner.cudagraph_batch_sizes[
                -1]:
            # Acl graph mode, add padding to the batch size
            num_input_tokens = self.vllm_config.pad_for_cudagraph(num_tokens)