[bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090)
### What this PR does / why we need it?
This is a bug fix to resolve the issue where the MOE model fails to load
quantized weights in w4a8 format when EP is not enabled.The parameters
["weight_scale_second", "weight_offset_second", "scale_bias"] shall be
parsed in per-group mode, regardless of other conditions.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
This commit is contained in:
@@ -220,8 +220,8 @@ class AscendFusedMoEMethod(FusedMoEMethodBase):
|
||||
set_weight_attrs(param, extra_weight_attrs)
|
||||
|
||||
extra_weight_attrs.update({"quant_method": FusedMoeWeightScaleSupported.CHANNEL.value})
|
||||
per_group_param = (
|
||||
["weight_scale_second", "weight_offset_second", "scale_bias"] + ["weight_scale", "weight_offset"]
|
||||
per_group_param = ["weight_scale_second", "weight_offset_second", "scale_bias"] + (
|
||||
["weight_scale", "weight_offset"]
|
||||
if hasattr(self.quant_method, "group_size") and self.quant_method.group_size > 0
|
||||
else []
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user