xc-llm-ascend

Files

Qiu ea6206bb18 [bugfix][ACLGraph][MTP] deletes cudagraph_batch_sizes in MtpProposer (#5183 )

### What this PR does / why we need it?
This PR deletes `cudagraph_batch_sizes` in `MtpProposer` and reuses the
one in `NPUModelRunner`.

During our deployment of DeepSeek-V3.2 with MTP across machines 2P2D and
conducting AISBench stress testing, an error occurred (see below). After
investigation, we found that
`compilation_config.cudagraph_capture_sizes` is modified by
`adjust_cudagraph_sizes_for_spec_decode` in `NPUModelRunner`. This
modification only updates `cudagraph_batch_sizes` in `NPUModelRunner`
but is not synchronized to `MtpProposer`. After discussion (CC @yiz-liu)
, we believe it is unnecessary to maintain `cudagraph_batch_sizes` in
`MtpProposer`; it should directly use the variable from
`NPUModelRunner`.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>

2025-12-22 14:08:27 +08:00

e2e

[Performance] Add async exponential while model executing (#4501 )

2025-12-20 21:23:21 +08:00

[bugfix][ACLGraph][MTP] deletes cudagraph_batch_sizes in MtpProposer (#5183 )

2025-12-22 14:08:27 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00