For nz unset in bf16&fp16 (#4495)

### What this PR does / why we need it? disable NZ for float weight case. This is only a quick fix for dev branch. For main branch, we'll consider more case to make it more common. ### Does this PR introduce _any_ user-facing change?  ### How was this patch tested? qwen2.5 32B <img width="441" height="221" alt="image" src="https://github.com/user-attachments/assets/7ae18ffd-1ce2-43d9-9960-be45250ad0da" /> --------- Signed-off-by: 刘哲续 <liuzhexu1@huawei.com> Co-authored-by: 刘哲续 <liuzhexu1@huawei.com>
2025-11-28 17:32:25 +08:00
parent 96c362361e
commit 71acc8ddeb
10 changed files with 16 additions and 14 deletions
--- a/vllm_ascend/ops/common_fused_moe.py
+++ b/vllm_ascend/ops/common_fused_moe.py
@@ -76,7 +76,7 @@ class AscendUnquantizedFusedMoEMethod(UnquantizedFusedMoEMethod):
            w2_data = self._maybe_pad_weight(layer.w2_weight.data)
            layer.w2_weight = torch.nn.Parameter(w2_data, requires_grad=False)

-        if not is_310p() and is_enable_nz():
+        if not is_310p() and is_enable_nz(layer.w13_weight.data.dtype):
            layer.w13_weight.data = torch_npu.npu_format_cast(
                layer.w13_weight.data, ACL_FORMAT_FRACTAL_NZ)
            layer.w2_weight.data = torch_npu.npu_format_cast(