### What this PR does / why we need it?
Add configuration check logic for ascend scheduler: if chunked_prefill
is disabled, max_num_batched_tokens couldn't be less than max_model_len,
following vLLM;
### Does this PR introduce _any_ user-facing change?
users cannot set max_num_batched_tokens smaller than max_model_len with
ascend scheduler
### How was this patch tested?
CI and vllm serving passed
- vLLM version: v0.10.0
- vLLM main:
f77a0802b7
Signed-off-by: linfeng-yuan <1102311262@qq.com>
### What this PR does / why we need it?
Add uts for files in folder /core
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.9.2
- vLLM main:
5a19a6c670
---------
Signed-off-by: lwq <liwenquan5@huawei.com>
Co-authored-by: lwq <liwenquan5@huawei.com>
There are some duplicate tests for ascend scheduler. This PR remove them
to make the test clear.
After this PR. the singlecard e2e cost time is reduced from 47min to
46min.
- vLLM version: v0.9.2
- vLLM main:
1eb2b9c102
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>