[bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090)
### What this PR does / why we need it?
This is a bug fix to resolve the issue where the MOE model fails to load
quantized weights in w4a8 format when EP is not enabled.The parameters
["weight_scale_second", "weight_offset_second", "scale_bias"] shall be
parsed in per-group mode, regardless of other conditions.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
This commit is contained in:
@@ -220,8 +220,8 @@ class AscendFusedMoEMethod(FusedMoEMethodBase):
|
|||||||
set_weight_attrs(param, extra_weight_attrs)
|
set_weight_attrs(param, extra_weight_attrs)
|
||||||
|
|
||||||
extra_weight_attrs.update({"quant_method": FusedMoeWeightScaleSupported.CHANNEL.value})
|
extra_weight_attrs.update({"quant_method": FusedMoeWeightScaleSupported.CHANNEL.value})
|
||||||
per_group_param = (
|
per_group_param = ["weight_scale_second", "weight_offset_second", "scale_bias"] + (
|
||||||
["weight_scale_second", "weight_offset_second", "scale_bias"] + ["weight_scale", "weight_offset"]
|
["weight_scale", "weight_offset"]
|
||||||
if hasattr(self.quant_method, "group_size") and self.quant_method.group_size > 0
|
if hasattr(self.quant_method, "group_size") and self.quant_method.group_size > 0
|
||||||
else []
|
else []
|
||||||
)
|
)
|
||||||
|
|||||||
Reference in New Issue
Block a user