[MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention (#4349)

### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at https://github.com/vllm-project/vllm/pull/29388. - [x] Simplify padding logic for FA. - [x] Add patch for https://github.com/vllm-project/vllm/pull/28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com>
2025-11-28 14:23:00 +08:00
parent bdc66972db
commit e52ebf8674
9 changed files with 802 additions and 2100 deletions
--- a/vllm_ascend/models/init.py
+++ b/vllm_ascend/models/init.py
@@ -1,7 +1,5 @@
 from vllm import ModelRegistry

-import vllm_ascend.envs as envs_ascend
-

 def register_model():
    ModelRegistry.register_model(
@@ -10,24 +8,11 @@ def register_model():

    ModelRegistry.register_model(
        "Qwen3VLMoeForConditionalGeneration",
-        "vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration"
-    )
+        "vllm_ascend.models.qwen3_vl:AscendQwen3VLMoeForConditionalGeneration")

    ModelRegistry.register_model(
        "Qwen3VLForConditionalGeneration",
-        "vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration"
-    )
-
-    if envs_ascend.USE_OPTIMIZED_MODEL:
-        ModelRegistry.register_model(
-            "Qwen2_5_VLForConditionalGeneration",
-            "vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration"
-        )
-    else:
-        ModelRegistry.register_model(
-            "Qwen2_5_VLForConditionalGeneration",
-            "vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen2_5_VLForConditionalGeneration_Without_Padding"
-        )
+        "vllm_ascend.models.qwen3_vl:AscendQwen3VLForConditionalGeneration")

    # There is no PanguProMoEForCausalLM in vLLM, so we should register it before vLLM config initialization
    # to make sure the model can be loaded correctly. This register step can be removed once vLLM support PanguProMoEForCausalLM.