[Bugfix] Remove ModelSlim-"M4 Quantization". (#4589)

The M4 quantization method in ModelSlim adds bias to model weights that originally do not have a linear bias. PR #4235 supported PD-MIX quantization and M4 quantization, adding bias to `w8a8.py` and `w8a8_dynamic.py`, and implementing adaptations in `ops/linear.py` to prevent it from being reset to `None` by `self.register_parameter("bias", None)`. However, this modification introduced an issue where the bias was still being reset to `None` in certain scenarios, causing errors during service startup. Therefore, support for M4 quantization is temporarily being reverted in this PR. ___ - vLLM version: v0.11.2 Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2025-12-01 23:45:02 +08:00
parent 8813832387
commit 12ca99c94e
3 changed files with 7 additions and 22 deletions
--- a/vllm_ascend/quantization/w8a8.py
+++ b/vllm_ascend/quantization/w8a8.py
@@ -87,7 +87,6 @@ class AscendW8A8LinearMethod:
        params_dict["weight_offset"] = torch.empty(output_size,
                                                   1,
                                                   dtype=params_dtype)
-        params_dict["bias"] = torch.zeros(output_size, dtype=torch.float32)
        return params_dict

    def get_pergroup_param(self,
@@ -199,13 +198,7 @@ class AscendW8A8LinearMethod:
                layer.weight.data, ACL_FORMAT_FRACTAL_NZ)
        layer.weight_scale.data = torch.flatten(layer.weight_scale.data)
        layer.weight_offset.data = torch.flatten(layer.weight_offset.data)
-        layer.bias.data = layer.bias.data.to(layer.weight_scale.data.dtype)
-
-        try:
-            ascend_quant_method = getattr(layer, "ascend_quant_method")
-        except AttributeError:
-            ascend_quant_method = ""
-
+        ascend_quant_method = getattr(layer, "ascend_quant_method", "")
        if ascend_quant_method == COMPRESSED_TENSORS_METHOD:
            deq_scale = layer.input_scale.data * layer.weight_scale.data
            layer.deq_scale = torch.nn.Parameter(deq_scale,