### What this PR does / why we need it? Add test for chunked prefill and prefix cache on v1/AscendScheduler Covered scenarios: - `Qwen/Qwen3-0.6B-Base` and `deepseek-ai/DeepSeek-V2-Lite-Chat` --- multicard CI time increased by 19 min - `V1 + default scheduler` vs `V1 + default scheduler + enable prefix cache` - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` - `Qwen/Qwen3-0.6B-Base` --- singlecard CI time increased by 8 min - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` should rebase after #1498 and #1446 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. Signed-off-by: MengqingCao <cmq0113@163.com>
9 lines
783 B
Plaintext
9 lines
783 B
Plaintext
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
|
|
Briefly describe the major milestones in the development of artificial intelligence from 1950 to 2020.
|
|
Compare and contrast artificial intelligence with human intelligence in terms of processing information.
|
|
Describe the basic components of a neural network and how it can be trained.
|
|
Write a short story about a robot that dreams for the first time.
|
|
Analyze the impact of the COVID-19 pandemic on global economic structures and future business models.
|
|
Explain the cultural significance of the Mona Lisa painting, and how its perception might vary in Western versus Eastern societies.
|
|
Translate the following English sentence into Japanese, French, and Swahili: 'The early bird catches the worm.'
|