[V1] Make V1 engine backward compatible (#637)

### What this PR does / why we need it?
Enforce eager mode in the V1 engine ahead of the upcoming CANN and
torch_npu releases.

### Does this PR introduce _any_ user-facing change?
After this change, users will no longer need to manually set
enforce_eager=True.

### How was this patch tested?
Test it with regular offline inference examples.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
yiz-liu
2025-04-24 17:20:11 +08:00
committed by GitHub
parent bd70ce828c
commit d785e78563
4 changed files with 43 additions and 46 deletions

View File

@@ -52,7 +52,7 @@ def test_models(model: str, dtype: str, max_tokens: int) -> None:
with VllmRunner(model,
max_model_len=8192,
dtype=dtype,
enforce_eager=True,
enforce_eager=False,
gpu_memory_utilization=0.7) as vllm_model:
vllm_model.generate_greedy(example_prompts, max_tokens)