[Bugfix] Remove use_aclgraph in mtp_proposer and use use_cuda_graph (#6032)

### What this PR does / why we need it? This PR aims to remove `use_aclgraph` and use `use_cuda_graph` just the same as eagle_proposer in mtp_proposer. The reason of these changes are described below. There is a scenario that `use_aclgraph=True` while `use_cuda_graph=False`, e.g. enabling `async_scheduling=True`. When using deepseek v3.2, `common_attn_metadata.num_input_tokens` is important and it should be the same as `num_input_tokens` entering into model. In the above scenario, `use_aclgraph` accidentally pad `num_tokens` to `num_input_tokens`, coinciding with `common_attn_metadata.num_input_tokens`. But later eager mode is triggered and actually we don't need padding. That means that the code logic is incorrect but the running output looks fine. However, `common_attn_metadata.num_input_tokens` should mean `num_input_tokens` entering into model. So we should update `common_attn_metadata.num_input_tokens = num_input_tokens` after padding. Therefore, we can safely use normal `use_cuda_graph` instead of problematic `use_acl_graph`. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? by ci - vLLM version: v0.13.0 - vLLM main: 2c24bc6996 Signed-off-by: Zetong Li <slippersss@126.com>
2026-01-22 21:08:07 +08:00
parent 176bfc36bc
commit 63d3921208
2 changed files with 2 additions and 7 deletions
--- a/vllm_ascend/spec_decode/eagle_proposer.py
+++ b/vllm_ascend/spec_decode/eagle_proposer.py
@@ -107,8 +107,6 @@ class EagleProposer(VllmEagleProposer):
        self.pcp_rank = self.runner.pcp_rank
        self.dcp_rank = self.runner.dcp_rank

-        self.use_aclgraph = self.runner._use_aclgraph()
-
        self.full_indices = range(
            self.runner.max_num_tokens * self.pcp_size * self.dcp_size +
            self.pcp_size * self.dcp_size * self.runner.max_num_reqs)