[refactor] refactor model runner capture model (#5230)

### What this PR does / why we need it? Refactor the `capture_model` method in model_runner to directly reuse the method from vLLM. Currently, most of the logic in the capture_model method is similar to that in the vllm code. Directly using the vllm method can reduce the maintenance cost of the vllm-ascend code. Modify as follows: 1、refactor capture_model function, directly inheriting community methods 2、refactor initialize_aclgraph_capture function, move to initialize_attn_backend ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: ad32e3e19c Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
2025-12-30 08:32:14 +08:00
parent 5e96f94d2a
commit 15d73f248e
10 changed files with 142 additions and 254 deletions
--- a/tests/ut/attention/test_sfa_v1.py
+++ b/tests/ut/attention/test_sfa_v1.py
@@ -2,7 +2,6 @@ import sys
 from unittest.mock import MagicMock, patch

 import torch
-from vllm.v1.attention.backends.utils import AttentionCGSupport

 from tests.ut.base import TestBase
 from vllm_ascend.attention.attention_v1 import AscendAttentionState
@@ -98,7 +97,6 @@ class TestAscendSFAMetadataBuilder(TestBase):
                                           vllm_config=vllm_config,
                                           device=device)

-        assert builder.aclgraph_support == AttentionCGSupport.UNIFORM_SINGLE_TOKEN_DECODE
        assert builder.device == device
        assert builder.vllm_config == vllm_config