[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139)

### What this PR does / why we need it? When GLM5 target model uses rotary quant, the final hidden states passes to MTP need to do an extra rotary. - vLLM version: v0.16.0 - vLLM main: 4034c3d32e --------- Signed-off-by: Wangbingjie <wangbj1207@126.com> Signed-off-by: wangbj127 <256472688+wangbj127@users.noreply.github.com>
2026-03-12 20:01:24 +08:00
parent de93790d08
commit 0c659e91ed
3 changed files with 96 additions and 0 deletions
--- a/vllm_ascend/patch/worker/init.py
+++ b/vllm_ascend/patch/worker/init.py
@@ -44,3 +44,4 @@ import vllm_ascend.patch.worker.patch_npugraph_ex_triton  # noqa
 import vllm_ascend.patch.worker.patch_kimi_k25  # noqa
 import vllm_ascend.patch.worker.patch_draft_quarot  # noqa
 import vllm_ascend.patch.worker.patch_cudagraph  # noqa
+import vllm_ascend.patch.worker.patch_deepseek_mtp  # noqa