xc-llm-ascend/vllm_ascend at 640ecd1b772b1c3dcdc57336b762cc02d011eba8 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

liuchen2026fly 640ecd1b77 [BugFix] Fix muls_add fusion not working for GLM5 models (#6928 )

### What this PR does / why we need it?
fix: support model-specific routed_scaling_factor in muls_add fusion
Previously, MulsAddFusionPass used a hardcoded scale=1.0, which failed
to match the x * routed_scaling_factor + y pattern in models like GLM5
that use routed_scaling_factor=2.5. This caused the muls_add fusion to
be skipped, leaving unoptimized mul+add operations.

This fix reads routed_scaling_factor from model config (defaulting to
1.0
for backward compatibility) and uses it as the pattern scale, enabling
correct fusion for GLM5 and other models with custom scaling factors.

Fixes: Unoptimized mul+add in GLM5 attention blocks
Tested: GLM5-W8A8 with routed_scaling_factor=2.5
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: liuchenbing <chenliumail@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>

2026-03-05 22:35:54 +08:00

..

[300I][Bugfix] fix unquant model weight nd2nz error (#6851 )

2026-03-03 15:57:26 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[BugFix] [dcp] Fix GQA Model Error when Enable both DP and DCP (#7012 )

2026-03-05 16:51:08 +08:00

[BugFix] Fix muls_add fusion not working for GLM5 models (#6928 )

2026-03-05 22:35:54 +08:00

[P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (#6898 )

2026-03-02 23:24:03 +08:00

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

【main】ADXL/HIXL supports FabricMem Mode (#6806 )

2026-03-05 21:04:11 +08:00

[EPLB] The profiling can collect the time required for adjusting the eplb. (#7001 )

2026-03-05 16:10:57 +08:00

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

Add GemmaRmsNorm ACLGraph Support (#6473 )

2026-03-05 16:15:07 +08:00

[Feat]fused_qkvzba_split_reshape supports token number greater than 65536 (#6740 )

2026-03-05 14:41:38 +08:00

[bugfix]Qwen-Omni quantization model_type bugfix (#7007 )

2026-03-05 16:34:34 +08:00

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

[Spec Decode]clean up spec decode interface (#6947 )

2026-03-05 14:30:10 +08:00

[BugFix][MTP] Fix prefill misclassified as decode when prompt tokens == num_spec_tokens + 1 (#6835 )

2026-03-05 17:33:10 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

ascend_forward_context.py

add mxfp8 moe quantization (#6670 )

2026-03-02 11:04:06 +08:00

batch_invariant.py

[Feature] Add docs of batch invariance and make some extra operators patch (#6910 )

2026-03-05 09:12:40 +08:00

cpu_binding.py

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945 )

2026-03-03 17:20:52 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874 )

2026-03-05 14:27:11 +08:00