[Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121)
### What this PR does / why we need it?
1. The dependency was introduced by
https://github.com/vllm-project/vllm-ascend/pull/874
- Move numba/quart from requirements-dev to requirments
- Align pyproject.toml with requirements
2. This patch also fix deepseek accuracy baseline which
https://github.com/vllm-project/vllm-ascend/pull/1118 was not addressed.
According to https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite the
gsm8k is about `41.1`
3. This also sync the vLLM upstream changes:
eaa2e51088
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
vllm ascend test (basic workflow)
vllm longterm test (spec decode)
Closes: https://github.com/vllm-project/vllm-ascend/issues/1120
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
@@ -31,6 +31,8 @@ from vllm.config import VllmConfig
|
||||
from vllm.logger import logger
|
||||
from vllm.utils import weak_ref_tensors
|
||||
|
||||
from vllm_ascend.utils import vllm_version_is
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class ConcreteSizeEntry:
|
||||
@@ -205,7 +207,10 @@ class NPUPiecewiseBackend:
|
||||
entry.output = weak_ref_tensors(output)
|
||||
entry.aclgraph = aclgraph
|
||||
|
||||
compilation_counter.num_cudagraph_caputured += 1
|
||||
if vllm_version_is("0.9.0"):
|
||||
compilation_counter.num_cudagraph_caputured += 1
|
||||
else:
|
||||
compilation_counter.num_cudagraph_captured += 1
|
||||
|
||||
# important: we need to return the output, rather than
|
||||
# the weak ref of the output, so that pytorch can correctly
|
||||
|
||||
Reference in New Issue
Block a user