[Model][VLM][Patch]Modify ascend affinity _merge_multimodal_embeddings (#3071)

### What this PR does / why we need it? This PR aims to address the incompatibility of the `.masked_scatter_` operation in the current `_merge_multimodal_embeddings` function on Ascend. For now, it reverts to the previous version of the CPU operation, which can be executed asynchronously on the device side to enhance performance. - vLLM version: v0.10.2 - vLLM main: f225ea7dd9 --------- Signed-off-by: booker123456 <945658361@qq.com>
2025-09-24 10:25:28 +08:00
parent b1380f3b87
commit c4b976af1a
3 changed files with 71 additions and 0 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -56,6 +56,18 @@
 #    Future Plan:
 #       Find a better way to support tensor alignment for 310p without this patch.
 #
+# ** File: platform/patch_common/patch_multimodal_merge.py**
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#   1. `vllm.model_executor.models.utils._merge_multimodal_embeddings`
+#    Why:
+#       '_merge_multimodal_embeddings' func of vllm is incompatible with Ascend.
+#    How：
+#       Replace with CPU operation that can be executed asynchronously.
+#    Related PR (if no, explain why):
+#       This is a bug by Ascend only. It can' be fixed in vLLM.
+#    Future Plan:
+#       Identify this pattern in torch-npu and remove this patch.
+#
 # * Worker Patch:
 # ===============
 # ** File: worker/patch_common/patch_minicpm.py **