[0.11.0][Perf] Add padding vision tower for Qwen2_5_Omni (#4041)

### What this PR does / why we need it? This PR repalce the vision tower in Qwen2.5-Omni-Thinker model, Qwen2_5_VisionTransformer, with AscendQwen2_5_VisionTransformer, which use QKV padding for padding performance. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: Ting FU <futing10@huawei.com>
2025-11-08 13:56:05 +08:00
parent d4e2a44307
commit f9842560cb
3 changed files with 66 additions and 0 deletions
--- a/vllm_ascend/models/init.py
+++ b/vllm_ascend/models/init.py
@@ -23,6 +23,10 @@ def register_model():
            "Qwen2_5_VLForConditionalGeneration",
            "vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration"
        )
+        ModelRegistry.register_model(
+            "Qwen2_5OmniModel",
+            "vllm_ascend.models.qwen2_5_omni_thinker:AscendQwen2_5OmniThinkerForConditionalGeneration"
+        )
    else:
        ModelRegistry.register_model(
            "Qwen2_5_VLForConditionalGeneration",