[Feat]Xlite Qwen3-vl Support (#5228)
### What this PR does / why we need it?
This patch adds support for the Qwen3-VL model in Xlite. For more
details about Xlite, please refer to the following
link:https://atomgit.com/openeuler/GVirt/blob/master/xlite/README.md.
The latest performance comparison data between xlite and the default
aclgraph mode is as follows:
### Does this PR introduce _any_ user-facing change?
XLite graph mode supports the Qwen3-VL model.
### How was this patch tested?
vLLM version: v0.12.0
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
Signed-off-by: lvjunqi <lvjunqi1@huawei.com>
Co-authored-by: lvjunqi <lvjunqi1@huawei.com>
This commit is contained in:
@@ -49,7 +49,7 @@ The details of each configuration option are as follows:
|
||||
**xlite_graph_config**
|
||||
| Name | Type | Default | Description |
|
||||
| ---- | ---- | ------- | ----------- |
|
||||
| `enabled` | bool | `False` | Whether to enable xlite graph mode. Currently only Llama or Qwen dense series models are supported. |
|
||||
| `enabled` | bool | `False` | Whether to enable xlite graph mode. Currently only Llama, Qwen dense series models, and Qwen3-vl are supported. |
|
||||
| `full_mode` | bool | `False` | Whether to enable xlite for both the prefill and decode stages. By default, xlite is only enabled for the decode stage. |
|
||||
|
||||
**weight_prefetch_config**
|
||||
|
||||
Reference in New Issue
Block a user