[refactor] refactor model runner capture model (#5230)
### What this PR does / why we need it?
Refactor the `capture_model` method in model_runner to directly reuse
the method from vLLM.
Currently, most of the logic in the capture_model method is similar to
that in the vllm code. Directly using the vllm method can reduce the
maintenance cost of the vllm-ascend code. Modify as follows:
1、refactor capture_model function, directly inheriting community methods
2、refactor initialize_aclgraph_capture function, move to
initialize_attn_backend
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
This commit is contained in:
@@ -26,6 +26,8 @@ from vllm.platforms import Platform, PlatformEnum
|
||||
|
||||
# todo: please remove it when solve cuda hard code in vllm
|
||||
os.environ["VLLM_DISABLE_SHARED_EXPERTS_STREAM"] = "1"
|
||||
# todo: please remove it when support controls garbage collection during CUDA graph capture.
|
||||
os.environ["VLLM_ENABLE_CUDAGRAPH_GC"] = "1"
|
||||
|
||||
from vllm_ascend.ascend_config import init_ascend_config
|
||||
from vllm_ascend.utils import refresh_block_size
|
||||
@@ -244,6 +246,12 @@ class NPUPlatform(Platform):
|
||||
data_parallel_size,
|
||||
)
|
||||
compilation_config.use_inductor = False
|
||||
# NOTE: Theoretically, we should also add vllm::mla_forward in the attention ops.
|
||||
# Since the process is created in the spawn mode, the value of the class attribute
|
||||
# attention ops transmitted is still the one before modification, so it has not been modified.
|
||||
# This will cause in scenarios where both piecewise and splitting ops are configured simultaneously,
|
||||
# If splitting ops does not contain the vllm::mla forward value, this configuration issue will
|
||||
# not be detected in advance assert.
|
||||
compilation_config.splitting_ops.extend(["vllm::mla_forward"])
|
||||
update_aclgraph_sizes(vllm_config)
|
||||
ascend_config.enable_npugraph_ex = False
|
||||
|
||||
Reference in New Issue
Block a user