[Bugfix] Remove ModelSlim-"M4 Quantization". (#4589)
The M4 quantization method in ModelSlim adds bias to model weights that originally do not have a linear bias. PR #4235 supported PD-MIX quantization and M4 quantization, adding bias to `w8a8.py` and `w8a8_dynamic.py`, and implementing adaptations in `ops/linear.py` to prevent it from being reset to `None` by `self.register_parameter("bias", None)`. However, this modification introduced an issue where the bias was still being reset to `None` in certain scenarios, causing errors during service startup. Therefore, support for M4 quantization is temporarily being reverted in this PR. ___ - vLLM version: v0.11.2 Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
@@ -60,7 +60,6 @@ class AscendW8A8DynamicLinearMethod:
|
||||
params_dict["weight_offset"] = torch.empty(output_size,
|
||||
1,
|
||||
dtype=params_dtype)
|
||||
params_dict["bias"] = torch.zeros(output_size, dtype=torch.float32)
|
||||
return params_dict
|
||||
|
||||
def get_pergroup_param(self,
|
||||
@@ -98,7 +97,6 @@ class AscendW8A8DynamicLinearMethod:
|
||||
layer.weight_scale.data = layer.weight_scale.data.flatten()
|
||||
layer.weight_scale_fp32 = layer.weight_scale.data.to(torch.float32)
|
||||
layer.weight_offset.data = layer.weight_offset.data.flatten()
|
||||
layer.bias.data = layer.bias.data.to(layer.weight_scale.data.dtype)
|
||||
|
||||
|
||||
class AscendW8A8DynamicFusedMoEMethod:
|
||||
|
||||
Reference in New Issue
Block a user