[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518)

### What this PR does / why we need it? Add muls_add triton kernel with related fusion pass. What's more, this PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: v0.13.0 - vLLM main: 45c1ca1ca1 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-03-02 17:54:25 +08:00
parent 8547520726
commit 16c879cdf7
14 changed files with 290 additions and 98 deletions
--- a/docs/source/user_guide/configuration/additional_config.md
+++ b/docs/source/user_guide/configuration/additional_config.md
@@ -31,7 +31,6 @@ The following table lists additional configuration options available in vLLM Asc
 | `finegrained_tp_config`             | dict | `{}`    | Configuration options for module tensor parallelism                                                       |
 | `ascend_compilation_config`         | dict | `{}`    | Configuration options for ascend compilation                                                              |
 | `eplb_config`                       | dict | `{}`    | Configuration options for ascend compilation |
-| `npugraph_ex_config`                | dict | `{}`    | Configuration options for Npugraph_ex backend                                                             |
 | `refresh`                           | bool | `false` | Whether to refresh global Ascend configuration content. This is usually used by rlhf or ut/e2e test case. |
 | `dump_config_path`                  | str  | `None`  | Configuration file path for msprobe dump(eager mode).                                                     |
 | `enable_async_exponential`          | bool | `False` | Whether to enable asynchronous exponential overlap. To enable asynchronous exponential, set this config to True.        |
@@ -76,9 +75,12 @@ The details of each configuration option are as follows:

 | Name | Type | Default | Description |
 | ---- | ---- | ------- | ----------- |
+| `enable_npugraph_ex`               | bool | `True` | Whether to enable npugraph_ex backend.                                                 |
+| `enable_static_kernel` | bool | `False` | Whether to enable static kernel. Suitable for scenarios where shape changes are minimal and some time is available for static kernel compilation. |
 | `fuse_norm_quant`  | bool | `True` | Whether to enable fuse_norm_quant pass. |
 | `fuse_qknorm_rope` | bool | `True` | Whether to enable fuse_qknorm_rope pass. If Triton is not in the environment, set it to False. |
 | `fuse_allreduce_rms` | bool | `False` | Whether to enable fuse_allreduce_rms pass. It's set to False because of conflict with SP. |
+| `fuse_muls_add` | bool | `True` | Whether to enable fuse_muls_add pass.|

 **eplb_config**

@@ -91,16 +93,6 @@ The details of each configuration option are as follows:
 | `expert_map_record_path`         | str | `None` | Save the expert load calculation results to a new expert table in the specified directory.|
 | `num_redundant_experts`          | int | `0`    | Specify redundant experts during initialization. |

-**npugraph_ex_config**
-
-| Name                   | Type | Default | Description                                                                            |
-|------------------------| ---- |---------|----------------------------------------------------------------------------------------|
-| `enable`               | bool | `True` | Whether to enable npugraph_ex backend.                                                 |
-| `enable_static_kernel` | bool | `False` | Whether to enable static kernel. Suitable for scenarios where shape changes are minimal and some time is available for static kernel compilation. |
-| `fuse_norm_quant`  | bool | `True` | Whether to enable fuse_norm_quant pass. |
-| `fuse_qknorm_rope` | bool | `True` | Whether to enable fuse_qknorm_rope pass. If Triton is not in the environment, set it to False. |
-| `fuse_allreduce_rms` | bool | `False` | Whether to enable fuse_allreduce_rms pass. It's set to False because of conflict with SP. |
-
 ### Example

 An example of additional configuration is as follows: