[V1] Make V1 engine backward compatible (#637)

### What this PR does / why we need it?
Enforce eager mode in the V1 engine ahead of the upcoming CANN and
torch_npu releases.

### Does this PR introduce _any_ user-facing change?
After this change, users will no longer need to manually set
enforce_eager=True.

### How was this patch tested?
Test it with regular offline inference examples.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
yiz-liu
2025-04-24 17:20:11 +08:00
committed by GitHub
parent bd70ce828c
commit d785e78563
4 changed files with 43 additions and 46 deletions

View File

@@ -47,7 +47,6 @@ def test_models_distributed(model: str,
dtype=dtype,
tensor_parallel_size=4,
distributed_executor_backend=distributed_executor_backend,
enforce_eager=True,
) as vllm_model:
vllm_model.generate_greedy(example_prompts, max_tokens)