Files
xc-llm-ascend/tests/ut/base.py
Ronald1995 32a9c5f694 [Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926)
### What this PR does / why we need it?
it'll execute allreduce and malmul seperately in vllm RowParallelLinear
forward funcion, this function use torch_npu.npu_mm_all_reduce_base to
execute allreduce and matmul in a fused kernel way. this will gain a 20%
performance
promotion in eager mode.
### Does this PR introduce _any_ user-facing change?
this PR introduce a new env `VLLM_ASCEND_ENABLE_MATMUL_ALLREDUCE` to
control whether enable the feature or not.

### How was this patch tested?
the patch is tested by adding a new test file `test_patch_linear.py` to
guard the ut


- vLLM version: v0.10.0
- vLLM main:
7728dd77bb

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
2025-07-28 15:13:37 +08:00

45 lines
1.3 KiB
Python

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#
import unittest
import pytest
from vllm_ascend.utils import adapt_patch, register_ascend_customop
class TestBase(unittest.TestCase):
def __init__(self, *args, **kwargs):
# adapt patch by default.
adapt_patch(True)
adapt_patch()
register_ascend_customop()
super().setUp()
super(TestBase, self).__init__(*args, **kwargs)
class PytestBase:
"""Base class for pytest-based tests.
because pytest mocker and parametrize usage are not compatible with unittest.
so we need to use a separate base class for pytest tests.
"""
@pytest.fixture(autouse=True)
def setup(self):
adapt_patch(True)
adapt_patch()
register_ascend_customop()