[Model][VLM][Patch]Modify ascend affinity _merge_multimodal_embeddings (#3071)

### What this PR does / why we need it?

This PR aims to address the incompatibility of the `.masked_scatter_`
operation in the current `_merge_multimodal_embeddings` function on
Ascend. For now, it reverts to the previous version of the CPU
operation, which can be executed asynchronously on the device side to
enhance performance.

- vLLM version: v0.10.2
- vLLM main:
f225ea7dd9

---------

Signed-off-by: booker123456 <945658361@qq.com>
This commit is contained in:
Peipei
2025-09-24 10:25:28 +08:00
committed by GitHub
parent b1380f3b87
commit c4b976af1a
3 changed files with 71 additions and 0 deletions

View File

@@ -56,6 +56,18 @@
# Future Plan:
# Find a better way to support tensor alignment for 310p without this patch.
#
# ** File: platform/patch_common/patch_multimodal_merge.py**
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `vllm.model_executor.models.utils._merge_multimodal_embeddings`
# Why:
# '_merge_multimodal_embeddings' func of vllm is incompatible with Ascend.
# How
# Replace with CPU operation that can be executed asynchronously.
# Related PR (if no, explain why):
# This is a bug by Ascend only. It can' be fixed in vLLM.
# Future Plan:
# Identify this pattern in torch-npu and remove this patch.
#
# * Worker Patch:
# ===============
# ** File: worker/patch_common/patch_minicpm.py **