[BugFix][main] Fix quantization related mtp bug with patch (#3620)

vLLM 0.11.0 didn't bring PR (https://github.com/vllm-project/vllm/pull/25805) thus missing the prefix of mtp's SharedHead. This PR fixes this bug with a patch to vllm's deepseek_mtp. main also need this bugfix to support vllm's v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-10-23 09:54:31 +08:00
parent 4381d296e5
commit 72695c97d0
3 changed files with 73 additions and 0 deletions
--- a/vllm_ascend/patch/worker/init.py
+++ b/vllm_ascend/patch/worker/init.py
@@ -27,3 +27,8 @@ import vllm_ascend.patch.worker.patch_roberta  # noqa
 import vllm_ascend.patch.worker.patch_weight_loader  # noqa
 import vllm_ascend.patch.worker.patch_multimodal_merge  # noqa
 import vllm_ascend.patch.worker.patch_minicpm  # noqa
+
+from vllm_ascend.utils import vllm_version_is
+
+if vllm_version_is("0.11.0"):
+    import vllm_ascend.patch.worker.patch_deepseek_mtp  # noqa