Files
xc-llm-ascend/vllm_ascend
Yizhou 1f25d60870 [Fix] Cap max tokens to prevent potential OOM (#3720)
### What this PR does / why we need it?
Caps the calculated maximum number of tokens at 512.

This prevents allocating an excessively large buffer when a cudagraph
capture size is not specified, mitigating the risk of out-of-memory
errors.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.

- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-10-25 11:23:21 +08:00
..
2025-10-24 17:23:36 +08:00
2025-10-21 22:58:02 +08:00
2025-10-09 10:28:38 +08:00
2025-10-24 10:32:01 +08:00