[Fusion] change fusion env variable (#6201)
### What this PR does / why we need it?
Since CI has integrated Triton, `fuse_qknorm_rope` is enabled by
default.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed with new added/existing test.
- vLLM version: v0.14.0
- vLLM main:
d68209402d
---------
Signed-off-by: wxsIcey <1790571317@qq.com>
This commit is contained in:
@@ -76,7 +76,8 @@ The details of each configuration option are as follows:
|
|||||||
| Name | Type | Default | Description |
|
| Name | Type | Default | Description |
|
||||||
| ---- | ---- | ------- | ----------- |
|
| ---- | ---- | ------- | ----------- |
|
||||||
| `fuse_norm_quant` | bool | `True` | Whether to enable fuse_norm_quant pass. |
|
| `fuse_norm_quant` | bool | `True` | Whether to enable fuse_norm_quant pass. |
|
||||||
| `fuse_qknorm_rope` | bool | `False` | Whether to enable fuse_qknorm_rope pass. It's set to True by default when Triton is installed. |
|
| `fuse_qknorm_rope` | bool | `True` | Whether to enable fuse_qknorm_rope pass. If Triton is not in the environment, set it to False. |
|
||||||
|
| `fuse_allreduce_rms` | bool | `False` | Whether to enable fuse_allreduce_rms pass. It's set to False because of conflict with SP. |
|
||||||
|
|
||||||
**eplb_config**
|
**eplb_config**
|
||||||
|
|
||||||
|
|||||||
@@ -17,7 +17,6 @@ import os
|
|||||||
from typing import TYPE_CHECKING
|
from typing import TYPE_CHECKING
|
||||||
|
|
||||||
from vllm.logger import logger
|
from vllm.logger import logger
|
||||||
from vllm.triton_utils import HAS_TRITON
|
|
||||||
from vllm.utils.math_utils import cdiv
|
from vllm.utils.math_utils import cdiv
|
||||||
|
|
||||||
if TYPE_CHECKING:
|
if TYPE_CHECKING:
|
||||||
@@ -190,7 +189,7 @@ class AscendCompilationConfig:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self, fuse_norm_quant: bool = True, fuse_qknorm_rope: bool = False, fuse_allreduce_rms: bool = False, **kwargs
|
self, fuse_norm_quant: bool = True, fuse_qknorm_rope: bool = True, fuse_allreduce_rms: bool = False, **kwargs
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Initialize the configuration.
|
Initialize the configuration.
|
||||||
@@ -200,13 +199,13 @@ class AscendCompilationConfig:
|
|||||||
When set to True, the system will optimize norm and quant operations.
|
When set to True, the system will optimize norm and quant operations.
|
||||||
Default: True
|
Default: True
|
||||||
fuse_qknorm_rope (bool): Whether to enable qknorm and rope fusion optimization.
|
fuse_qknorm_rope (bool): Whether to enable qknorm and rope fusion optimization.
|
||||||
Default: False
|
Default: True
|
||||||
fuse_allreduce_rms (bool): Whether to enable allreduce and addrmsnorm fusion optimization.
|
fuse_allreduce_rms (bool): Whether to enable allreduce and addrmsnorm fusion optimization.
|
||||||
Default: False
|
Default: False
|
||||||
**kwargs: Additional optional parameters for forward compatibility and configuration extension.
|
**kwargs: Additional optional parameters for forward compatibility and configuration extension.
|
||||||
"""
|
"""
|
||||||
self.fuse_norm_quant = fuse_norm_quant
|
self.fuse_norm_quant = fuse_norm_quant
|
||||||
self.fuse_qknorm_rope = HAS_TRITON or fuse_qknorm_rope
|
self.fuse_qknorm_rope = fuse_qknorm_rope
|
||||||
self.fuse_allreduce_rms = fuse_allreduce_rms
|
self.fuse_allreduce_rms = fuse_allreduce_rms
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user