[Fusion] normalize fusion naming and enable e2e test (#4693)

### What this PR does / why we need it?
This PR standardizes the fusion naming, changing
`enable_quantization_fusion` to `fuse_norm_quant`, and enables e2e
testing.

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
This commit is contained in:
Icey
2025-12-11 17:53:43 +08:00
committed by GitHub
parent 07c7131104
commit 18221c0e1d
8 changed files with 136 additions and 113 deletions

View File

@@ -88,8 +88,7 @@ class NPUPlatform(Platform):
Get the custom compile backend. Previously, we used EagerAdaptor by default.
To use graph fusion operations, we defined our own backend compiler.
"""
from vllm_ascend.compilation.compiler_interface import AscendCompiler
return AscendCompiler.__module__ + "." + AscendCompiler.__name__
return "vllm_ascend.compilation.compiler_interface.AscendCompiler"
@classmethod
def pre_register_and_update(cls,
@@ -225,8 +224,8 @@ class NPUPlatform(Platform):
if compilation_config.cudagraph_mode == CUDAGraphMode.FULL_AND_PIECEWISE:
compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISE
from vllm_ascend.compilation.compiler_interface import AscendCompiler
compilation_config.oot_compiler = AscendCompiler.__module__ + "." + AscendCompiler.__name__
# get custom compile backend for graph fusion
compilation_config.oot_compiler = cls.get_compile_backend()
if compilation_config.cudagraph_mode == CUDAGraphMode.NONE:
compilation_config.mode = CompilationMode.NONE