[BugFix][Fusion] Patch compile backend to make fusion available (#5308)
Currently, the vllm pr: https://github.com/vllm-project/vllm/pull/24252
is causing operator fusion to fail, which can be mitigated by patching
the backend. Once the problem is completely resolved, I will submit a
new pull request to remove the patch.
- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef
---------
Signed-off-by: wxsIcey <1790571317@qq.com>
This commit is contained in:
@@ -106,6 +106,20 @@
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM merge the PR.
|
||||
#
|
||||
# ** 7. File: platform/patch_compile_backend.py**
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `vllm.compilation.backends.PiecewiseCompileInterpreter`
|
||||
# `vllm.compilation.piecewise_backend.PiecewiseBackend`
|
||||
# Why:
|
||||
# vllm removed the compile graph for general shape, which caused operator fusion to fail.
|
||||
# This issue affects the performance of model inference on Ascend.
|
||||
# How:
|
||||
# recover the compiled graph for dynamic_shape in PiecewiseBackend.
|
||||
# Related PR (if no, explain why):
|
||||
# https://github.com/vllm-project/vllm/pull/24252
|
||||
# Future Plan:
|
||||
# Remove this patch when fix the problem.
|
||||
#
|
||||
# * Worker Patch:
|
||||
# ===============
|
||||
#
|
||||
|
||||
Reference in New Issue
Block a user