[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836)

### What this PR does / why we need it? 1. Implentment `NPUPiecewiseBackend` to enable aclgraph 2. Eable aclgraph by default in V1, but raise error when running deepseek and raise warning when running models except for qwen ### How was this patch tested? CI pass with the new ut --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-05-29 11:58:26 +08:00
parent cc74b97f74
commit a93bed4535
8 changed files with 380 additions and 33 deletions
--- a/tests/long_term/spec_decode/e2e/test_v1_spec_decode.py
+++ b/tests/long_term/spec_decode/e2e/test_v1_spec_decode.py
@@ -72,7 +72,7 @@ def test_ngram_correctness(
    with monkeypatch.context() as m:
        m.setenv("VLLM_USE_V1", "1")

-        ref_llm = LLM(model=model_name, max_model_len=1024)
+        ref_llm = LLM(model=model_name, max_model_len=1024, enforce_eager=True)
        ref_outputs = ref_llm.chat(test_prompts, sampling_config)
        del ref_llm

@@ -85,6 +85,7 @@ def test_ngram_correctness(
                "num_speculative_tokens": 3,
            },
            max_model_len=1024,
+            enforce_eager=True,
        )
        spec_outputs = spec_llm.chat(test_prompts, sampling_config)
        matches = 0
@@ -135,6 +136,7 @@ def test_eagle_correctness(
                "max_model_len": 2048,
            },
            max_model_len=2048,
+            enforce_eager=True,
        )
        spec_outputs = spec_llm.chat(test_prompts, sampling_config)
        matches = 0