[BugFix][v0.11.0] Fix quantization related mtp bug with patch (#3619)
vLLM 0.11.0 didn't bring PR (https://github.com/vllm-project/vllm/pull/25805) thus missing the prefix of mtp's SharedHead. This PR fixes this bug with a patch to vllm's deepseek_mtp. --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
@@ -146,3 +146,17 @@
|
||||
# No, this need CANN add an aclnn shift operation
|
||||
# Future Plan:
|
||||
# Revert this when CANN support shift aclnn operation
|
||||
#
|
||||
# ** File: worker/patch_deepseek_mtp.py**
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `vllm.model_executor.models.deepseek_mtp.DeepSeekMultiTokenPredictorLayer.__init__`
|
||||
# Why:
|
||||
# '__init__' func of DeepSeekMultiTokenPredictorLayer didn't pass prefix to SharedHead.
|
||||
# How:
|
||||
# Replace with a new __init__.
|
||||
# Use a new SharedHead which passes prefix to ParallelLMHead.
|
||||
# Related PR (if no, explain why):
|
||||
# https://github.com/vllm-project/vllm/pull/25805
|
||||
# Future Plan:
|
||||
# Remove this patch when adapted vllm version contains the above PR.
|
||||
#
|
||||
|
||||
Reference in New Issue
Block a user