[CI]enable chunked prefill by default (#4569)
set `enable_chunked_prefill` to True for e2e test by default to keep the same behavior with vLLM - vLLM version: v0.11.2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -118,7 +118,6 @@ def test_eagle_correctness(
|
||||
spec_model_name = eagle3_model_name() if use_eagle3 else eagle_model_name()
|
||||
with VllmRunner(
|
||||
model_name,
|
||||
enable_chunked_prefill=True,
|
||||
max_num_seqs=1,
|
||||
max_num_batched_tokens=2048,
|
||||
gpu_memory_utilization=0.6,
|
||||
|
||||
Reference in New Issue
Block a user