### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions, and fixing broken links in the Chinese README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. --------- Signed-off-by: sunshine202600 <sunshine202600@163.com>
989 B
989 B
Npugraph_ex
Introduction
As introduced in the RFC, this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.
Using Npugraph_ex
Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.
Offline example:
from vllm import LLM
model = LLM(
model="path/to/Qwen2-7B-Instruct",
additional_config={
"ascend_compilation_config": {
"enable_npugraph_ex": True,
"enable_static_kernel": False,
}
}
)
outputs = model.generate("Hello, how are you?")
Online example:
vllm serve Qwen/Qwen2-7B-Instruct
--additional-config '{"ascend_compilation_config":{"enable_npugraph_ex":true, "enable_static_kernel":false}}'
You can find more details about npugraph_ex