xc-llm-ascend/docs/source/user_guide/feature_guide/npugraph_ex.md

# Npugraph_ex

## Introduction

As introduced in the [RFC](https://github.com/vllm-project/vllm-ascend/issues/4715), this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.

## Using npugraph_ex

Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.

Offline example:

```python
from vllm import LLM

model = LLM(
    model="path/to/Qwen2-7B-Instruct",
    additional_config={
        "ascend_compilation_config": {
            "enable_npugraph_ex": True,
            "enable_static_kernel": False,
        }
    }
)
outputs = model.generate("Hello, how are you?")
```

Online example:

```shell
vllm serve Qwen/Qwen2-7B-Instruct
--additional-config '{"ascend_compilation_config":{"enable_npugraph_ex":true, "enable_static_kernel":false}}'
```

You can find more details about npugraph_ex [here](https://www.hiascend.com/document/detail/zh/Pytorch/730/modthirdparty/torchairuseguide/torchair_00021.html)
[doc][npugraph_ex]add npugraph_ex introduction doc (#6306) ### What this PR does / why we need it? As part of the preparation work for the [RFC](https://github.com/vllm-project/vllm-ascend/issues/6214) We have added a documentation about npugraph_ex, which mainly explains and introduces its usage and FX graph optimization. The introduction to FX graph optimization also includes specific explanations of the default passes, the implementation methods for custom fusion passes, and how to capture the FX graph during the optimization process through environment variable configuration. --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> 2026-01-30 11:21:37 +08:00			`# Npugraph_ex`

			`## Introduction`

			`As introduced in the [RFC](https://github.com/vllm-project/vllm-ascend/issues/4715), this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.`

			`## Using npugraph_ex`

			`Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.`

			`Offline example:`

			```python
			`from vllm import LLM`

			`model = LLM(`
			`model="path/to/Qwen2-7B-Instruct",`
			`additional_config={`
[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518) ### What this PR does / why we need it? Add muls_add triton kernel with related fusion pass. What's more, this PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/45c1ca1ca1ee8fa06df263c8715e8a412ff408d4 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2026-03-02 17:54:25 +08:00			`"ascend_compilation_config": {`
			`"enable_npugraph_ex": True,`
[doc][npugraph_ex]add npugraph_ex introduction doc (#6306) ### What this PR does / why we need it? As part of the preparation work for the [RFC](https://github.com/vllm-project/vllm-ascend/issues/6214) We have added a documentation about npugraph_ex, which mainly explains and introduces its usage and FX graph optimization. The introduction to FX graph optimization also includes specific explanations of the default passes, the implementation methods for custom fusion passes, and how to capture the FX graph during the optimization process through environment variable configuration. --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> 2026-01-30 11:21:37 +08:00			`"enable_static_kernel": False,`
			`}`
			`}`
			`)`
			`outputs = model.generate("Hello, how are you?")`
			```

			`Online example:`

			```shell
			`vllm serve Qwen/Qwen2-7B-Instruct`
[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518) ### What this PR does / why we need it? Add muls_add triton kernel with related fusion pass. What's more, this PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/45c1ca1ca1ee8fa06df263c8715e8a412ff408d4 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2026-03-02 17:54:25 +08:00			`--additional-config '{"ascend_compilation_config":{"enable_npugraph_ex":true, "enable_static_kernel":false}}'`
[doc][npugraph_ex]add npugraph_ex introduction doc (#6306) ### What this PR does / why we need it? As part of the preparation work for the [RFC](https://github.com/vllm-project/vllm-ascend/issues/6214) We have added a documentation about npugraph_ex, which mainly explains and introduces its usage and FX graph optimization. The introduction to FX graph optimization also includes specific explanations of the default passes, the implementation methods for custom fusion passes, and how to capture the FX graph during the optimization process through environment variable configuration. --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> 2026-01-30 11:21:37 +08:00			```

			`You can find more details about npugraph_ex [here](https://www.hiascend.com/document/detail/zh/Pytorch/730/modthirdparty/torchairuseguide/torchair_00021.html)`