[bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090)

### What this PR does / why we need it?
This is a bug fix to resolve the issue where the MOE model fails to load
quantized weights in w4a8 format when EP is not enabled.The parameters
["weight_scale_second", "weight_offset_second", "scale_bias"] shall be
parsed in per-group mode, regardless of other conditions.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
This commit is contained in:
shaopeng-666
2026-03-10 16:57:05 +08:00
committed by GitHub
parent a5ea699e29
commit 6e8d3681ae

View File

@@ -220,8 +220,8 @@ class AscendFusedMoEMethod(FusedMoEMethodBase):
set_weight_attrs(param, extra_weight_attrs)
extra_weight_attrs.update({"quant_method": FusedMoeWeightScaleSupported.CHANNEL.value})
per_group_param = (
["weight_scale_second", "weight_offset_second", "scale_bias"] + ["weight_scale", "weight_offset"]
per_group_param = ["weight_scale_second", "weight_offset_second", "scale_bias"] + (
["weight_scale", "weight_offset"]
if hasattr(self.quant_method, "group_size") and self.quant_method.group_size > 0
else []
)