[CI/UT] Add test for chunk prefill and prefix cache on v1/AscendScheduler (#1505)
### What this PR does / why we need it? Add test for chunked prefill and prefix cache on v1/AscendScheduler Covered scenarios: - `Qwen/Qwen3-0.6B-Base` and `deepseek-ai/DeepSeek-V2-Lite-Chat` --- multicard CI time increased by 19 min - `V1 + default scheduler` vs `V1 + default scheduler + enable prefix cache` - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` - `Qwen/Qwen3-0.6B-Base` --- singlecard CI time increased by 8 min - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` should rebase after #1498 and #1446 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
8
tests/e2e/prompts/example.txt
Normal file
8
tests/e2e/prompts/example.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
|
||||
Briefly describe the major milestones in the development of artificial intelligence from 1950 to 2020.
|
||||
Compare and contrast artificial intelligence with human intelligence in terms of processing information.
|
||||
Describe the basic components of a neural network and how it can be trained.
|
||||
Write a short story about a robot that dreams for the first time.
|
||||
Analyze the impact of the COVID-19 pandemic on global economic structures and future business models.
|
||||
Explain the cultural significance of the Mona Lisa painting, and how its perception might vary in Western versus Eastern societies.
|
||||
Translate the following English sentence into Japanese, French, and Swahili: 'The early bird catches the worm.'
|
||||
Reference in New Issue
Block a user