xc-llm-ascend

Files

Li Wang cdece86f2c [Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass (#806 )

### What this PR does / why we need it?

1. Fix V1 error found by
[nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610),
broken by [[v1] Pass BlockTable and KVCacheSpec to
AttentionMetadataBuilders
#17483](https://github.com/vllm-project/vllm/pull/17483), make
`InputBatch` parameter consistent with vllm.
2. Disable benmark and fix it in upstream.

### Does this PR introduce _any_ user-facing change?

No


### How was this patch tested?

CI passed

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>

2025-05-12 00:36:56 +08:00

__init__.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

cache_engine.py

support deepseek quant & mix-parallel with graphmode (#585 )

2025-04-23 16:23:25 +08:00

draft_model_runner.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

model_runner_v1.py

[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass (#806 )