[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518)
### What this PR does / why we need it?
Add muls_add triton kernel with related fusion pass. What's more, this
PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`.
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
CI passed with new added test.
- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1
---------
Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
@@ -31,7 +31,6 @@ The following table lists additional configuration options available in vLLM Asc
|
||||
| `finegrained_tp_config` | dict | `{}` | Configuration options for module tensor parallelism |
|
||||
| `ascend_compilation_config` | dict | `{}` | Configuration options for ascend compilation |
|
||||
| `eplb_config` | dict | `{}` | Configuration options for ascend compilation |
|
||||
| `npugraph_ex_config` | dict | `{}` | Configuration options for Npugraph_ex backend |
|
||||
| `refresh` | bool | `false` | Whether to refresh global Ascend configuration content. This is usually used by rlhf or ut/e2e test case. |
|
||||
| `dump_config_path` | str | `None` | Configuration file path for msprobe dump(eager mode). |
|
||||
| `enable_async_exponential` | bool | `False` | Whether to enable asynchronous exponential overlap. To enable asynchronous exponential, set this config to True. |
|
||||
@@ -76,9 +75,12 @@ The details of each configuration option are as follows:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
| ---- | ---- | ------- | ----------- |
|
||||
| `enable_npugraph_ex` | bool | `True` | Whether to enable npugraph_ex backend. |
|
||||
| `enable_static_kernel` | bool | `False` | Whether to enable static kernel. Suitable for scenarios where shape changes are minimal and some time is available for static kernel compilation. |
|
||||
| `fuse_norm_quant` | bool | `True` | Whether to enable fuse_norm_quant pass. |
|
||||
| `fuse_qknorm_rope` | bool | `True` | Whether to enable fuse_qknorm_rope pass. If Triton is not in the environment, set it to False. |
|
||||
| `fuse_allreduce_rms` | bool | `False` | Whether to enable fuse_allreduce_rms pass. It's set to False because of conflict with SP. |
|
||||
| `fuse_muls_add` | bool | `True` | Whether to enable fuse_muls_add pass.|
|
||||
|
||||
**eplb_config**
|
||||
|
||||
@@ -91,16 +93,6 @@ The details of each configuration option are as follows:
|
||||
| `expert_map_record_path` | str | `None` | Save the expert load calculation results to a new expert table in the specified directory.|
|
||||
| `num_redundant_experts` | int | `0` | Specify redundant experts during initialization. |
|
||||
|
||||
**npugraph_ex_config**
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
|------------------------| ---- |---------|----------------------------------------------------------------------------------------|
|
||||
| `enable` | bool | `True` | Whether to enable npugraph_ex backend. |
|
||||
| `enable_static_kernel` | bool | `False` | Whether to enable static kernel. Suitable for scenarios where shape changes are minimal and some time is available for static kernel compilation. |
|
||||
| `fuse_norm_quant` | bool | `True` | Whether to enable fuse_norm_quant pass. |
|
||||
| `fuse_qknorm_rope` | bool | `True` | Whether to enable fuse_qknorm_rope pass. If Triton is not in the environment, set it to False. |
|
||||
| `fuse_allreduce_rms` | bool | `False` | Whether to enable fuse_allreduce_rms pass. It's set to False because of conflict with SP. |
|
||||
|
||||
### Example
|
||||
|
||||
An example of additional configuration is as follows:
|
||||
|
||||
Reference in New Issue
Block a user