xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
liuchen2026fly	640ecd1b77	[BugFix] Fix muls_add fusion not working for GLM5 models (#6928 ) ### What this PR does / why we need it? fix: support model-specific routed_scaling_factor in muls_add fusion Previously, MulsAddFusionPass used a hardcoded scale=1.0, which failed to match the x * routed_scaling_factor + y pattern in models like GLM5 that use routed_scaling_factor=2.5. This caused the muls_add fusion to be skipped, leaving unoptimized mul+add operations. This fix reads routed_scaling_factor from model config (defaulting to 1.0 for backward compatibility) and uses it as the pattern scale, enabling correct fusion for GLM5 and other models with custom scaling factors. Fixes: Unoptimized mul+add in GLM5 attention blocks Tested: GLM5-W8A8 with routed_scaling_factor=2.5 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: liuchenbing <chenliumail@163.com> Co-authored-by: liuchenbing <chenliumail@163.com>	2026-03-05 22:35:54 +08:00
whx	16c879cdf7	[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 ) ### What this PR does / why we need it? Add muls_add triton kernel with related fusion pass. What's more, this PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: v0.13.0 - vLLM main: `45c1ca1ca1` --------- Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-03-02 17:54:25 +08:00

liuchen2026fly

640ecd1b77

[BugFix] Fix muls_add fusion not working for GLM5 models (#6928 )

### What this PR does / why we need it?
fix: support model-specific routed_scaling_factor in muls_add fusion
Previously, MulsAddFusionPass used a hardcoded scale=1.0, which failed
to match the x * routed_scaling_factor + y pattern in models like GLM5
that use routed_scaling_factor=2.5. This caused the muls_add fusion to
be skipped, leaving unoptimized mul+add operations.

This fix reads routed_scaling_factor from model config (defaulting to
1.0
for backward compatibility) and uses it as the pattern scale, enabling
correct fusion for GLM5 and other models with custom scaling factors.

Fixes: Unoptimized mul+add in GLM5 attention blocks
Tested: GLM5-W8A8 with routed_scaling_factor=2.5
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: liuchenbing <chenliumail@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>

2026-03-05 22:35:54 +08:00

whx

16c879cdf7

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

### What this PR does / why we need it?
Add muls_add triton kernel with related fusion pass. What's more, this
PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?
CI passed with new added test.


- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>

2026-03-02 17:54:25 +08:00

2 Commits