[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434)
### What this PR does / why we need it?
Add configuration check logic for ascend scheduler: if chunked_prefill
is disabled, max_num_batched_tokens couldn't be less than max_model_len,
following vLLM;
### Does this PR introduce _any_ user-facing change?
users cannot set max_num_batched_tokens smaller than max_model_len with
ascend scheduler
### How was this patch tested?
CI and vllm serving passed
- vLLM version: v0.10.0
- vLLM main:
f77a0802b7
Signed-off-by: linfeng-yuan <1102311262@qq.com>
This commit is contained in:
@@ -16,7 +16,7 @@ def test_concurrent_partial_prefill():
|
||||
},
|
||||
},
|
||||
max_num_seqs=3,
|
||||
max_num_batched_tokens=200,
|
||||
max_num_batched_tokens=2048,
|
||||
enforce_eager=True,
|
||||
max_model_len=2048,
|
||||
gpu_memory_utilization=0.7) as vllm_model:
|
||||
@@ -35,7 +35,7 @@ def test_prefix_cache_stats_is_recorded():
|
||||
},
|
||||
},
|
||||
max_num_seqs=3,
|
||||
max_num_batched_tokens=200,
|
||||
max_num_batched_tokens=2048,
|
||||
enforce_eager=True,
|
||||
max_model_len=2048,
|
||||
gpu_memory_utilization=0.7) as vllm_model:
|
||||
|
||||
Reference in New Issue
Block a user