Files
xc-llm-ascend/docs/source/user_guide/feature_guide/npugraph_ex.md
whx 16c879cdf7 [Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518)
### What this PR does / why we need it?
Add muls_add triton kernel with related fusion pass. What's more, this
PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?
CI passed with new added test.


- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-03-02 17:54:25 +08:00

994 B

Npugraph_ex

Introduction

As introduced in the RFC, this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.

Using npugraph_ex

Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.

Offline example:

from vllm import LLM

model = LLM(
    model="path/to/Qwen2-7B-Instruct",
    additional_config={
        "ascend_compilation_config": {
            "enable_npugraph_ex": True,
            "enable_static_kernel": False,
        }
    }
)
outputs = model.generate("Hello, how are you?")

Online example:

vllm serve Qwen/Qwen2-7B-Instruct
--additional-config '{"ascend_compilation_config":{"enable_npugraph_ex":true, "enable_static_kernel":false}}'

You can find more details about npugraph_ex here