[Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass (#806)
### What this PR does / why we need it? 1. Fix V1 error found by [nightly_ci](https://github.com/vllm-project/vllm-ascend/actions/runs/14950004754/job/41998136610), broken by [[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483](https://github.com/vllm-project/vllm/pull/17483), make `InputBatch` parameter consistent with vllm. 2. Disable benmark and fix it in upstream. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
@@ -39,6 +39,8 @@ norecursedirs =
|
||||
vllm-empty/tests/neuron
|
||||
; fastsafetensors not support npu now
|
||||
vllm-empty/tests/fastsafetensors_loader
|
||||
; Enable after https://github.com/vllm-project/vllm-ascend/issues/808 resolved
|
||||
vllm-empty/tests/benchmarks
|
||||
|
||||
addopts = --ignore=vllm-empty/tests/test_utils.py
|
||||
--ignore=vllm-empty/tests/test_config.py
|
||||
|
||||
Reference in New Issue
Block a user