Add qwen2.5 vl multimodal feature for vllm-ascend v1 (#736)

### What this PR does / why we need it? The current vllm-ascend is not support the multimodal model in vllm-ascend v1 yet. So I change the `model_runner_v1.py` file with using MRoPE feature and so on to support this feature. It currently still not perfect since the Ascend operator is not support the `window/full attn` to reduce Memcpy operations, so it would out of memory if the input embedding is too large, so We can't use `self._profile_multimodal()` for profile since it use a big dummy input (i.e. images) as the multimodal input. Fixes: https://github.com/vllm-project/vllm-ascend/issues/514 ### Does this PR introduce _any_ user-facing change? No, this feature not need change the user-facing ### How was this patch tested? I test this offline using my machine 910B3 and my own fork, and it works well. --------- Signed-off-by: cty <ctynb@qq.com>
2025-06-07 16:53:19 +08:00
parent 87ebaef4e4
commit 20dedba5d1
2 changed files with 268 additions and 7 deletions
--- a/tests/singlecard/test_offline_inference.py
+++ b/tests/singlecard/test_offline_inference.py
@@ -60,8 +60,6 @@ def test_models(model: str, dtype: str, max_tokens: int) -> None:


@pytest.mark.parametrize("model", MULTIMODALITY_MODELS)
-@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "1",
-                    reason="qwen2.5_vl is not supported on v1")
 def test_multimodal(model, prompt_template, vllm_runner):
    image = ImageAsset("cherry_blossom") \
        .pil_image.convert("RGB")