mfix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_lentp (#3910)
### What this PR does / why we need it? 1. Revert [bugfix for mtp in fullgraph](0948483642) and support it when vllm supports 2. raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len 3. bugfix when max_num_seqs=14 in mtp=2 scenario ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main:83f478bb19--------- Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
This commit is contained in:
@@ -306,7 +306,6 @@ class NPUPlatform(Platform):
|
||||
**********************************************************************************\033[0m
|
||||
"""
|
||||
logger.warning(warning_message)
|
||||
update_aclgraph_sizes(vllm_config)
|
||||
else:
|
||||
logger.info(
|
||||
"%s cudagraph_mode is not support on NPU. falling back to NONE",
|
||||
@@ -344,7 +343,6 @@ class NPUPlatform(Platform):
|
||||
**********************************************************************************\033[0m
|
||||
"""
|
||||
logger.warning(warning_message)
|
||||
update_aclgraph_sizes(vllm_config)
|
||||
else:
|
||||
logger.info(
|
||||
"%s cudagraph_mode is not support on NPU. falling back to NONE",
|
||||
|
||||
Reference in New Issue
Block a user