[Graph][Bugfix] Set default cudagraph max capture size via platform defaults (#7572)

### What this PR does / why we need it?

This PR lets NPU platform provide its own default
`max_cudagraph_capture_size` via
`NPUPlatform.apply_config_platform_defaults()`.

Previously, when cudagraph sizing was left unset, Ascend inherited
vLLM's upstream default heuristic in `_set_cudagraph_sizes()`, which
uses `max_num_seqs * decode_query_len * 2`. This PR changes Ascend's
default to `min(max_num_seqs * decode_query_len, 512)` while keeping the
rest of vLLM's cudagraph sizing logic unchanged.

### Does this PR introduce _any_ user-facing change?

Yes, but only for Ascend when users do not explicitly configure
cudagraph sizing.

If `max_cudagraph_capture_size` and `cudagraph_capture_sizes` are both
unset, we now uses `max_num_seqs * decode_query_len` (capped at `512`)
instead of the upstream `* 2` default. Explicit user settings are
unchanged.

### How was this patch tested?

Add unit tests to cover:

- default max injection via `apply_config_platform_defaults()`
- explicit `max_cudagraph_capture_size` is preserved
- explicit `cudagraph_capture_sizes` are preserved
- Ascend default max no longer uses the upstream `* 2`
- late `_set_cudagraph_sizes()` recomputation reuses the current max
input

- vLLM version: v0.18.0
- vLLM main:
ed359c497a

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>
This commit is contained in:
linfeng-yuan
2026-03-25 17:57:19 +08:00
committed by GitHub
parent d452d04656
commit 05a561129e
2 changed files with 159 additions and 2 deletions

View File

@@ -152,6 +152,49 @@ class NPUPlatform(Platform):
config_deprecated_logging()
@classmethod
def _get_default_max_cudagraph_capture_size(cls, vllm_config: VllmConfig) -> int | None:
"""Mirror the default-max branch in vLLM's `_set_cudagraph_sizes()`.
This helper corresponds to the upstream block under
"determine the initial max_cudagraph_capture_size" when
`compilation_config.max_cudagraph_capture_size is None`.
Ascend injects this default earlier via `apply_config_platform_defaults()`
so the rest of `_set_cudagraph_sizes()` can keep using upstream logic for
size-list generation, token-cap clipping, SP filtering, and later
post-processing. The only intentional difference from upstream is removing
the CUDA-oriented trailing `* 2`: Ascend wants the default capture upper
bound to track `max_num_seqs * decode_query_len`, capped at 512.
Returning `None` means the platform should not inject a default. This
covers the cases where the user has already provided either
`max_cudagraph_capture_size` or `cudagraph_capture_sizes`.
"""
compilation_config = vllm_config.compilation_config
if compilation_config.max_cudagraph_capture_size is not None:
return None
if compilation_config.cudagraph_capture_sizes is not None:
return None
scheduler_config = getattr(vllm_config, "scheduler_config", None)
max_num_seqs = getattr(scheduler_config, "max_num_seqs", None)
if max_num_seqs is None:
return None
decode_query_len = 1
speculative_config = getattr(vllm_config, "speculative_config", None)
if speculative_config and speculative_config.num_speculative_tokens:
decode_query_len += speculative_config.num_speculative_tokens
return min(max_num_seqs * decode_query_len, 512)
@classmethod
def apply_config_platform_defaults(cls, vllm_config: VllmConfig) -> None:
default_max_cg_capture_size = cls._get_default_max_cudagraph_capture_size(vllm_config)
if default_max_cg_capture_size is not None:
vllm_config.compilation_config.max_cudagraph_capture_size = default_max_cg_capture_size
@classmethod
def get_device_capability(cls, device_id: int = 0):
return None
@@ -273,7 +316,10 @@ class NPUPlatform(Platform):
)
compilation_config.cudagraph_mode = CUDAGraphMode.NONE
# set cudaprah sizes before extending `compilation_config.splitting_ops`
# Recompute cudagraph sizes after Ascend-specific compatibility updates.
# The platform default max is injected earlier via
# `apply_config_platform_defaults`, so this late pass should only honor
# the current max / size inputs after the mode adjustments above.
vllm_config._set_cudagraph_sizes()
# TODO delete graph size update here when compilation_config.pass_config.enable_sp
# is supported by vllm-ascend.