[Bugfix] fix qwen3-vl-moe shape ERROR during the _prepare_inputs phase under high concurrency. (#4658)

### What this PR does / why we need it? Earlier we fixed a similar issue for qwen2.5-vl 【 https://github.com/vllm-project/vllm-ascend/issues/4430 】, and then the multimodal models in vllm v0.11.0 should all have this problem. Here, we have specifically proposed a fix for qwen3-vl-moe. --------- Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com>
2025-12-08 19:30:16 +08:00
parent d412565ec9
commit 4e728f1f40
2 changed files with 113 additions and 4 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -1395,7 +1395,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
            # as input to the multimodal model, even when the input is text.
            input_ids = self.input_ids[:total_num_scheduled_tokens]
            model_type = self.vllm_config.model_config.hf_config.model_type
-            if model_type == "qwen2_5_vl":
+            if model_type == "qwen2_5_vl" or model_type == "qwen3_vl_moe":
                inputs_embeds = self.model.get_input_embeddings(
                    input_ids,
                    multimodal_embeddings=mm_embeds,