[Bugifx] fix quant_apply_mlp w1_scale type error & fix getting num_local_expert (#4632)

### What this PR does / why we need it? Fix bugs introduced by bc67696a02 1. fix getting num_local_experet error in vllm_adaptor 2. fix w1_scale type error in moe_mlp.quant_apply_mlp.npu_dequant_swiglu_quant in w4a8 quantized scenario - vLLM version: v0.12.0 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <47294568+845473182@users.noreply.github.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-05 16:04:24 +08:00
parent a7f91079b8
commit a336543977
3 changed files with 4 additions and 4 deletions
--- a/vllm_ascend/ops/fused_moe/moe_mlp.py
+++ b/vllm_ascend/ops/fused_moe/moe_mlp.py
@@ -129,7 +129,7 @@ def quant_apply_mlp(hidden_states: torch.Tensor,
            # act_fn: swiglu
            hidden_states, swiglu_out_scale = torch_npu.npu_dequant_swiglu_quant(
                x=hidden_states,
-                weight_scale=w1_scale,
+                weight_scale=w1_scale[0],
                activation_scale=pertoken_scale,
                bias=None,
                quant_scale=None,