Add qwen2.5 vl multimodal feature for vllm-ascend v1 (#736)

### What this PR does / why we need it?

The current vllm-ascend is not support the multimodal model in
vllm-ascend v1 yet. So I change the `model_runner_v1.py` file with using
MRoPE feature and so on to support this feature. It currently still not
perfect since the Ascend operator is not support the `window/full attn`
to reduce Memcpy operations, so it would out of memory if the input
embedding is too large, so We can't use `self._profile_multimodal()` for
profile since it use a big dummy input (i.e. images) as the multimodal
input.

Fixes: https://github.com/vllm-project/vllm-ascend/issues/514

### Does this PR introduce _any_ user-facing change?

No, this feature not need change the user-facing

### How was this patch tested?

I test this offline using my machine 910B3 and my own fork, and it works
well.

---------

Signed-off-by: cty <ctynb@qq.com>
This commit is contained in:
TaoYu Chen
2025-06-07 16:53:19 +08:00
committed by GitHub
parent 87ebaef4e4
commit 20dedba5d1
2 changed files with 268 additions and 7 deletions

View File

@@ -60,8 +60,6 @@ def test_models(model: str, dtype: str, max_tokens: int) -> None:
@pytest.mark.parametrize("model", MULTIMODALITY_MODELS)
@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "1",
reason="qwen2.5_vl is not supported on v1")
def test_multimodal(model, prompt_template, vllm_runner):
image = ImageAsset("cherry_blossom") \
.pil_image.convert("RGB")