[Model][MiniCPM] support MiniCPM (#645)

### What this PR does / why we need it?
This pr support minicpm in branch main. see
https://github.com/vllm-project/vllm-ascend/pull/164


### How was this patch tested?
test locally with minicpm

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-04-27 11:27:24 +08:00
committed by GitHub
parent 742f679c7d
commit ba3d8aae94
5 changed files with 134 additions and 0 deletions

View File

@@ -96,6 +96,8 @@
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
# Future Plan:
# Remove those patch when vllm merged them
#
#
# * Worker Patch:
# ===============
# ** File: worker/patch_0_8_4/patch_metrics.py **
@@ -125,6 +127,20 @@
# Future Plan:
# Revert it when the related pr is merged in vllm.
#
# ** File: worker/patch_common/patch_minicpm.py **
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `vllm.model_executor.models.minicpm.MiniCPMAttention.forward`
# Why:
# The forward func of MiniCPMAttention in vllm do a datatype convert
# (original datatype --> float32) to ensure the precision on cuda.
# However float32 is not supported in cann rope op, thus we keep this patch
# How
# Removed the dtype convert operations in forward
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
# NO, only for npu due to rope op.
# Future Plan:
# Keep this patch in vllm-ascend.
#
# ** File: worker/patch_common/patch_multi_step_worker.py **
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `vllm.spec_decode.multi_step_worker.MultiStepWorker.sampler_output`
@@ -156,3 +172,15 @@
# - https://github.com/vllm-project/vllm-ascend/pull/395
# Future Plan:
# Revert it when the related pr is merged in vllm and vllm-ascend.
#
# ** File: worker/patch_0_8_4/patch_tritonplaceholder.py **
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `triton` Module
# Why:
# Triton is not supported on npu currently, importing triton will break vllm-ascend
# How
# ditto
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
# TritonPlaceholder is only available in vllm>0.8.4
# Future Plan:
# Revert it when branch main doesn't maintain v0.8.4.