[Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121)

### What this PR does / why we need it? 1. The dependency was introduced by https://github.com/vllm-project/vllm-ascend/pull/874 - Move numba/quart from requirements-dev to requirments - Align pyproject.toml with requirements 2. This patch also fix deepseek accuracy baseline which https://github.com/vllm-project/vllm-ascend/pull/1118 was not addressed. According to https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite the gsm8k is about `41.1` 3. This also sync the vLLM upstream changes: eaa2e51088 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed vllm ascend test (basic workflow) vllm longterm test (spec decode) Closes: https://github.com/vllm-project/vllm-ascend/issues/1120 --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-08 22:33:37 +08:00
parent f1543d5e0d
commit 4976b48b98
6 changed files with 37 additions and 13 deletions
--- a/vllm_ascend/compilation/piecewise_backend.py
+++ b/vllm_ascend/compilation/piecewise_backend.py
@@ -31,6 +31,8 @@ from vllm.config import VllmConfig
 from vllm.logger import logger
 from vllm.utils import weak_ref_tensors

+from vllm_ascend.utils import vllm_version_is
+

@dataclasses.dataclass
 class ConcreteSizeEntry:
@@ -205,7 +207,10 @@ class NPUPiecewiseBackend:
            entry.output = weak_ref_tensors(output)
            entry.aclgraph = aclgraph

-            compilation_counter.num_cudagraph_caputured += 1
+            if vllm_version_is("0.9.0"):
+                compilation_counter.num_cudagraph_caputured += 1
+            else:
+                compilation_counter.num_cudagraph_captured += 1

            # important: we need to return the output, rather than
            # the weak ref of the output, so that pytorch can correctly