[Bugfix] fix qwen3-vl-moe shape ERROR during the _prepare_inputs phase under high concurrency. (#4658)

### What this PR does / why we need it?
Earlier we fixed a similar issue for qwen2.5-vl 【
https://github.com/vllm-project/vllm-ascend/issues/4430 】, and then the
multimodal models in vllm v0.11.0 should all have this problem. Here, we
have specifically proposed a fix for qwen3-vl-moe.

---------

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>
Co-authored-by: Levi-JQ <yujinqi2@huawei.com>
This commit is contained in:
Levi
2025-12-08 19:30:16 +08:00
committed by GitHub
parent d412565ec9
commit 4e728f1f40
2 changed files with 113 additions and 4 deletions

View File

@@ -1395,7 +1395,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
# as input to the multimodal model, even when the input is text.
input_ids = self.input_ids[:total_num_scheduled_tokens]
model_type = self.vllm_config.model_config.hf_config.model_type
if model_type == "qwen2_5_vl":
if model_type == "qwen2_5_vl" or model_type == "qwen3_vl_moe":
inputs_embeds = self.model.get_input_embeddings(
input_ids,
multimodal_embeddings=mm_embeds,