[bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090)

### What this PR does / why we need it? This is a bug fix to resolve the issue where the MOE model fails to load quantized weights in w4a8 format when EP is not enabled.The parameters ["weight_scale_second", "weight_offset_second", "scale_bias"] shall be parsed in per-group mode, regardless of other conditions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: 4034c3d32e Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
2026-03-10 16:57:05 +08:00
parent a5ea699e29
commit 6e8d3681ae
1 changed files with 2 additions and 2 deletions
--- a/vllm_ascend/quantization/method_adapters.py
+++ b/vllm_ascend/quantization/method_adapters.py
@@ -220,8 +220,8 @@ class AscendFusedMoEMethod(FusedMoEMethodBase):
            set_weight_attrs(param, extra_weight_attrs)

        extra_weight_attrs.update({"quant_method": FusedMoeWeightScaleSupported.CHANNEL.value})
-        per_group_param = (
-            ["weight_scale_second", "weight_offset_second", "scale_bias"] + ["weight_scale", "weight_offset"]
+        per_group_param = ["weight_scale_second", "weight_offset_second", "scale_bias"] + (
+            ["weight_scale", "weight_offset"]
            if hasattr(self.quant_method, "group_size") and self.quant_method.group_size > 0
            else []
        )