Add qwen2.5 vl multimodal feature for vllm-ascend v1 (#736)
### What this PR does / why we need it? The current vllm-ascend is not support the multimodal model in vllm-ascend v1 yet. So I change the `model_runner_v1.py` file with using MRoPE feature and so on to support this feature. It currently still not perfect since the Ascend operator is not support the `window/full attn` to reduce Memcpy operations, so it would out of memory if the input embedding is too large, so We can't use `self._profile_multimodal()` for profile since it use a big dummy input (i.e. images) as the multimodal input. Fixes: https://github.com/vllm-project/vllm-ascend/issues/514 ### Does this PR introduce _any_ user-facing change? No, this feature not need change the user-facing ### How was this patch tested? I test this offline using my machine 910B3 and my own fork, and it works well. --------- Signed-off-by: cty <ctynb@qq.com>
This commit is contained in:
@@ -60,8 +60,6 @@ def test_models(model: str, dtype: str, max_tokens: int) -> None:
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model", MULTIMODALITY_MODELS)
|
||||
@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "1",
|
||||
reason="qwen2.5_vl is not supported on v1")
|
||||
def test_multimodal(model, prompt_template, vllm_runner):
|
||||
image = ImageAsset("cherry_blossom") \
|
||||
.pil_image.convert("RGB")
|
||||
|
||||
Reference in New Issue
Block a user