Files
xc-llm-ascend/vllm_ascend
zouyida2052 90aca84e60 fix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len (#3909)
### What this PR does / why we need it?
1. Revert [bugfix for mtp in
fullgraph](0948483642)
and support it when vllm supports
2. raise error when cudagraph_capture_sizes can't be an integer multiple
of uniform_decode_query_len
3. bugfix when max_num_seqs=14 in mtp=2 scenario

---------

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
2025-10-31 09:25:06 +08:00
..
2025-10-21 22:58:02 +08:00
2025-10-09 10:28:38 +08:00
2025-10-15 19:36:32 +08:00