mfix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_lentp (#3910)

### What this PR does / why we need it? 1. Revert [bugfix for mtp in fullgraph](0948483642) and support it when vllm supports 2. raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len 3. bugfix when max_num_seqs=14 in mtp=2 scenario ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: 83f478bb19 --------- Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
2025-10-31 09:24:50 +08:00
parent 35a913cf1e
commit 1966885be2
4 changed files with 63 additions and 61 deletions
--- a/vllm_ascend/platform.py
+++ b/vllm_ascend/platform.py
@@ -306,7 +306,6 @@ class NPUPlatform(Platform):
                **********************************************************************************\033[0m
                """
                logger.warning(warning_message)
-                update_aclgraph_sizes(vllm_config)
            else:
                logger.info(
                    "%s cudagraph_mode is not support on NPU. falling back to NONE",
@@ -344,7 +343,6 @@ class NPUPlatform(Platform):
                **********************************************************************************\033[0m
                """
                logger.warning(warning_message)
-                update_aclgraph_sizes(vllm_config)
            else:
                logger.info(
                    "%s cudagraph_mode is not support on NPU. falling back to NONE",