[Model][MiniCPM] support MiniCPM (#645)

### What this PR does / why we need it? This pr support minicpm in branch main. see https://github.com/vllm-project/vllm-ascend/pull/164 ### How was this patch tested? test locally with minicpm --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-27 11:27:24 +08:00
parent 742f679c7d
commit ba3d8aae94
5 changed files with 134 additions and 0 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -96,6 +96,8 @@
 #    Related PR (if no, explain why): no related PR, we want add this ability into vllm
 #    Future Plan:
 #       Remove those patch when vllm merged them
+#
+#
 # * Worker Patch:
 # ===============
 # ** File: worker/patch_0_8_4/patch_metrics.py **
@@ -125,6 +127,20 @@
 #    Future Plan:
 #       Revert it when the related pr is merged in vllm.
 #
+# ** File: worker/patch_common/patch_minicpm.py **
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#   1. `vllm.model_executor.models.minicpm.MiniCPMAttention.forward`
+#    Why:
+#       The forward func of MiniCPMAttention in vllm do a datatype convert
+#       (original datatype --> float32) to ensure the precision on cuda.
+#       However float32 is not supported in cann rope op, thus we keep this patch
+#    How：
+#       Removed the dtype convert operations in forward
+#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
+#       NO, only for npu due to rope op.
+#    Future Plan:
+#       Keep this patch in vllm-ascend.
+#
 # ** File: worker/patch_common/patch_multi_step_worker.py **
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #   1. `vllm.spec_decode.multi_step_worker.MultiStepWorker.sampler_output`
@@ -156,3 +172,15 @@
 #       - https://github.com/vllm-project/vllm-ascend/pull/395
 #    Future Plan:
 #       Revert it when the related pr is merged in vllm and vllm-ascend.
+#
+# ** File: worker/patch_0_8_4/patch_tritonplaceholder.py **
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#   1. `triton` Module
+#    Why:
+#       Triton is not supported on npu currently, importing triton will break vllm-ascend
+#    How：
+#       ditto
+#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
+#       TritonPlaceholder is only available in vllm>0.8.4
+#    Future Plan:
+#       Revert it when branch main doesn't maintain v0.8.4.