Files
xc-llm-ascend/vllm_ascend
zouyida2052 1966885be2 mfix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_lentp (#3910)
### What this PR does / why we need it?
1. Revert [bugfix for mtp in
fullgraph](0948483642)
and support it when vllm supports
2. raise error when cudagraph_capture_sizes can't be an integer multiple
of uniform_decode_query_len
3. bugfix when max_num_seqs=14 in mtp=2 scenario

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
2025-10-31 09:24:50 +08:00
..
2025-10-25 15:36:32 +08:00
2025-10-25 15:53:01 +08:00