[Bugfix] fix qwen3-vl-moe shape ERROR during the _prepare_inputs phase under high concurrency. (#4658)
### What this PR does / why we need it? Earlier we fixed a similar issue for qwen2.5-vl 【 https://github.com/vllm-project/vllm-ascend/issues/4430 】, and then the multimodal models in vllm v0.11.0 should all have this problem. Here, we have specifically proposed a fix for qwen3-vl-moe. --------- Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com>
This commit is contained in:
@@ -1395,7 +1395,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
# as input to the multimodal model, even when the input is text.
|
||||
input_ids = self.input_ids[:total_num_scheduled_tokens]
|
||||
model_type = self.vllm_config.model_config.hf_config.model_type
|
||||
if model_type == "qwen2_5_vl":
|
||||
if model_type == "qwen2_5_vl" or model_type == "qwen3_vl_moe":
|
||||
inputs_embeds = self.model.get_input_embeddings(
|
||||
input_ids,
|
||||
multimodal_embeddings=mm_embeds,
|
||||
|
||||
Reference in New Issue
Block a user