### What this PR does / why we need it? As part of the preparation work for the [RFC](https://github.com/vllm-project/vllm-ascend/issues/6214) We have added a documentation about npugraph_ex, which mainly explains and introduces its usage and FX graph optimization. The introduction to FX graph optimization also includes specific explanations of the default passes, the implementation methods for custom fusion passes, and how to capture the FX graph during the optimization process through environment variable configuration. --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
19 lines
392 B
Markdown
19 lines
392 B
Markdown
# Feature Guide
|
|
|
|
This section provides an overview of the features implemented in vLLM Ascend. Developers can refer to this guide to understand how vLLM Ascend works.
|
|
|
|
:::{toctree}
|
|
:caption: Feature Guide
|
|
:maxdepth: 1
|
|
patch
|
|
ModelRunner_prepare_inputs
|
|
disaggregated_prefill
|
|
eplb_swift_balancer.md
|
|
ACL_Graph
|
|
KV_Cache_Pool_Guide
|
|
add_custom_aclnn_op
|
|
context_parallel
|
|
quantization
|
|
npugraph_ex
|
|
:::
|