The M4 quantization method in ModelSlim adds bias to model weights that
originally do not have a linear bias. PR #4235 supported PD-MIX
quantization and M4 quantization, adding bias to `w8a8.py` and
`w8a8_dynamic.py`, and implementing adaptations in `ops/linear.py` to
prevent it from being reset to `None` by
`self.register_parameter("bias", None)`. However, this modification
introduced an issue where the bias was still being reset to `None` in
certain scenarios, causing errors during service startup. Therefore,
support for M4 quantization is temporarily being reverted in this PR.
___
- vLLM version: v0.11.2
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>