[doc][npugraph_ex]add npugraph_ex introduction doc (#6306)

### What this PR does / why we need it?
As part of the preparation work for the
[RFC](https://github.com/vllm-project/vllm-ascend/issues/6214)
We have added a documentation about npugraph_ex, which mainly explains
and introduces its usage and FX graph optimization.
The introduction to FX graph optimization also includes specific
explanations of the default passes, the implementation methods for
custom fusion passes, and how to capture the FX graph during the
optimization process through environment variable configuration.

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
This commit is contained in:
ChenCangtao
2026-01-30 11:21:37 +08:00
committed by GitHub
parent 1d661bb279
commit 46cee945b3
4 changed files with 138 additions and 0 deletions

View File

@@ -0,0 +1,35 @@
# Npugraph_ex
## Introduction
As introduced in the [RFC](https://github.com/vllm-project/vllm-ascend/issues/4715), this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.
## Using npugraph_ex
Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.
Offline example:
```python
from vllm import LLM
model = LLM(
model="path/to/Qwen2-7B-Instruct",
additional_config={
"npugraph_ex_config": {
"enable": True,
"enable_static_kernel": False,
}
}
)
outputs = model.generate("Hello, how are you?")
```
Online example:
```shell
vllm serve Qwen/Qwen2-7B-Instruct
--additional-config '{"npugraph_ex_config":{"enable":true, "enable_static_kernel":false}}'
```
You can find more details about npugraph_ex [here](https://www.hiascend.com/document/detail/zh/Pytorch/730/modthirdparty/torchairuseguide/torchair_00021.html)