[Bugfix]Fix deepseek 3.2 C8 precision by rotary tensor (#7537)
### What this PR does / why we need it?
During the attention quantization process of DeepSeek V3.2, it is
necessary to retrieve the Hadamard matrix from the weights to facilitate
the computation.
### Does this PR introduce _any_ user-facing change?
No. But there will be two new tensor in quant weight.
### How was this patch tested?
- vLLM version: v0.18.0
- vLLM main:
8b6325758c
---------
Signed-off-by: mayumeng <m30059191@china.huawei.com>
Co-authored-by: mayumeng <m30059191@china.huawei.com>
This commit is contained in:
@@ -39,7 +39,16 @@ def patch_deepseek(module):
|
||||
def new_remap(name: str, params_dict: dict):
|
||||
name = ori_maybe_remap_kv_scale_name(name, params_dict)
|
||||
|
||||
replace_scale_names = ["fa_q.scale", "fa_k.scale", "fa_v.scale", "fa_q.offset", "fa_k.offset", "fa_v.offset"]
|
||||
replace_scale_names = [
|
||||
"fa_q.scale",
|
||||
"fa_k.scale",
|
||||
"fa_v.scale",
|
||||
"fa_q.offset",
|
||||
"fa_k.offset",
|
||||
"fa_v.offset",
|
||||
"indexer.q_rot",
|
||||
"indexer.k_rot",
|
||||
]
|
||||
|
||||
for scale_name in replace_scale_names:
|
||||
if name.endswith(scale_name):
|
||||
|
||||
Reference in New Issue
Block a user