Files
xc-llm-ascend/vllm_ascend
Qiu 70713c3fd4 [cherry-pick][BugFix] Improve max_cudagraph_capture_size validation (#8252)
### What this PR does / why we need it?

This PR improves the validation of `max_cudagraph_capture_size` by
comparing it against the potential maximum tokens required for decoding,
derived from the scheduler configuration. It introduces a warning to
alert users when the capture size might be insufficient for the
workload, which could lead to suboptimal performance.
ref: #8227

### Does this PR introduce _any_ user-facing change?
Yes, a warning log is added when the `max_cudagraph_capture_size` is
smaller than the potential decode workload.

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2026-04-14 22:00:10 +08:00
..
2026-03-21 16:05:38 +08:00
2026-03-19 14:27:27 +08:00