[e2e Test][npugraph_ex]add static kernel e2e test case (#6320)

### What this PR does / why we need it? Added an E2E test case for the scenario of enabling a static kernel for npugraph_ex, monitoring its compilation and unloading process. Also fixed the previously existing spelling errors - vLLM version: v0.14.1 - vLLM main: dc917cceb8 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
2026-01-30 16:24:48 +08:00
parent 8969b94a14
commit f2990f7741
2 changed files with 34 additions and 2 deletions
--- a/vllm_ascend/compilation/compiler_interface.py
+++ b/vllm_ascend/compilation/compiler_interface.py
@@ -90,10 +90,10 @@ def npugraph_ex_compile(
        # affecting program execution.
        num_spec_tokens = vllm_config.speculative_config.num_speculative_token if vllm_config.speculative_config else 0
        uniform_decode_query_len = num_spec_tokens + 1
-        max_num_tokens = vllm_config.scheduler_config.max_num_seq * uniform_decode_query_len
+        max_num_tokens = vllm_config.scheduler_config.max_num_seqs * uniform_decode_query_len
        decode_cudagraph_batch_sizes = [
            x
-            for x in vllm_config.compilation_config.cudagraph_capture_size
+            for x in vllm_config.compilation_config.cudagraph_capture_sizes
            if max_num_tokens >= x >= uniform_decode_query_len
        ]
        config.experimental_config.aclgraph._aclnn_static_shape_kernel_sym_value_range = decode_cudagraph_batch_sizes