[BugFix][v0.11.0] Fix quantization related mtp bug with patch (#3619)

vLLM 0.11.0 didn't bring PR
(https://github.com/vllm-project/vllm/pull/25805) thus missing the
prefix of mtp's SharedHead. This PR fixes this bug with a patch to
vllm's deepseek_mtp.

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
whx
2025-10-22 23:06:09 +08:00
committed by GitHub
parent 6e72bfdc50
commit 6464c97ff9
3 changed files with 70 additions and 0 deletions

View File

@@ -146,3 +146,17 @@
# No, this need CANN add an aclnn shift operation
# Future Plan:
# Revert this when CANN support shift aclnn operation
#
# ** File: worker/patch_deepseek_mtp.py**
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `vllm.model_executor.models.deepseek_mtp.DeepSeekMultiTokenPredictorLayer.__init__`
# Why:
# '__init__' func of DeepSeekMultiTokenPredictorLayer didn't pass prefix to SharedHead.
# How
# Replace with a new __init__.
# Use a new SharedHead which passes prefix to ParallelLMHead.
# Related PR (if no, explain why):
# https://github.com/vllm-project/vllm/pull/25805
# Future Plan:
# Remove this patch when adapted vllm version contains the above PR.
#