[Bugfix]Fix deepseek 3.2 C8 precision by rotary tensor (#7537)

### What this PR does / why we need it? During the attention quantization process of DeepSeek V3.2, it is necessary to retrieve the Hadamard matrix from the weights to facilitate the computation. ### Does this PR introduce _any_ user-facing change? No. But there will be two new tensor in quant weight. ### How was this patch tested? - vLLM version: v0.18.0 - vLLM main: 8b6325758c --------- Signed-off-by: mayumeng <m30059191@china.huawei.com> Co-authored-by: mayumeng <m30059191@china.huawei.com>
2026-03-25 09:18:00 +08:00
parent d96440924a
commit 8977be1df3
4 changed files with 64 additions and 10 deletions
--- a/vllm_ascend/patch/worker/patch_weight_utils.py
+++ b/vllm_ascend/patch/worker/patch_weight_utils.py
@@ -39,7 +39,16 @@ def patch_deepseek(module):
    def new_remap(name: str, params_dict: dict):
        name = ori_maybe_remap_kv_scale_name(name, params_dict)

-        replace_scale_names = ["fa_q.scale", "fa_k.scale", "fa_v.scale", "fa_q.offset", "fa_k.offset", "fa_v.offset"]
+        replace_scale_names = [
+            "fa_q.scale",
+            "fa_k.scale",
+            "fa_v.scale",
+            "fa_q.offset",
+            "fa_k.offset",
+            "fa_v.offset",
+            "indexer.q_rot",
+            "indexer.k_rot",
+        ]

        for scale_name in replace_scale_names:
            if name.endswith(scale_name):