[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836)

### What this PR does / why we need it?
1. Implentment `NPUPiecewiseBackend` to enable aclgraph
2. Eable aclgraph by default in V1, but raise error when running
deepseek and raise warning when running models except for qwen

### How was this patch tested?
CI pass with the new ut

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-05-29 11:58:26 +08:00
committed by GitHub
parent cc74b97f74
commit a93bed4535
8 changed files with 380 additions and 33 deletions

View File

@@ -52,7 +52,7 @@ def test_models(model: str, dtype: str, max_tokens: int) -> None:
with VllmRunner(model,
max_model_len=8192,
dtype=dtype,
enforce_eager=False,
enforce_eager=True,
gpu_memory_utilization=0.7) as vllm_model:
vllm_model.generate_greedy(example_prompts, max_tokens)