[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836)

### What this PR does / why we need it?
1. Implentment `NPUPiecewiseBackend` to enable aclgraph
2. Eable aclgraph by default in V1, but raise error when running
deepseek and raise warning when running models except for qwen

### How was this patch tested?
CI pass with the new ut

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-05-29 11:58:26 +08:00
committed by GitHub
parent cc74b97f74
commit a93bed4535
8 changed files with 380 additions and 33 deletions

View File

@@ -72,7 +72,7 @@ def test_ngram_correctness(
with monkeypatch.context() as m:
m.setenv("VLLM_USE_V1", "1")
ref_llm = LLM(model=model_name, max_model_len=1024)
ref_llm = LLM(model=model_name, max_model_len=1024, enforce_eager=True)
ref_outputs = ref_llm.chat(test_prompts, sampling_config)
del ref_llm
@@ -85,6 +85,7 @@ def test_ngram_correctness(
"num_speculative_tokens": 3,
},
max_model_len=1024,
enforce_eager=True,
)
spec_outputs = spec_llm.chat(test_prompts, sampling_config)
matches = 0
@@ -135,6 +136,7 @@ def test_eagle_correctness(
"max_model_len": 2048,
},
max_model_len=2048,
enforce_eager=True,
)
spec_outputs = spec_llm.chat(test_prompts, sampling_config)
matches = 0