### What this PR does / why we need it?
Add muls_add triton kernel with related fusion pass. What's more, this
PR refactors `AscendCompilationConfig` and delete `NpugraphExConfig`.
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
CI passed with new added test.
- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1
---------
Signed-off-by: whx-sjtu <2952154980@qq.com>
994 B
994 B
Npugraph_ex
Introduction
As introduced in the RFC, this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.
Using npugraph_ex
Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.
Offline example:
from vllm import LLM
model = LLM(
model="path/to/Qwen2-7B-Instruct",
additional_config={
"ascend_compilation_config": {
"enable_npugraph_ex": True,
"enable_static_kernel": False,
}
}
)
outputs = model.generate("Hello, how are you?")
Online example:
vllm serve Qwen/Qwen2-7B-Instruct
--additional-config '{"ascend_compilation_config":{"enable_npugraph_ex":true, "enable_static_kernel":false}}'
You can find more details about npugraph_ex here