[Bugifx] fix quant_apply_mlp w1_scale type error & fix getting num_local_expert (#4632)

### What this PR does / why we need it?
Fix bugs introduced by
bc67696a02
1. fix getting num_local_experet error in vllm_adaptor
2. fix w1_scale type error in
moe_mlp.quant_apply_mlp.npu_dequant_swiglu_quant in w4a8 quantized
scenario

- vLLM version: v0.12.0

---------

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
Signed-off-by: 欧派果奶我还要 <47294568+845473182@users.noreply.github.com>
Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
欧派果奶我还要
2025-12-05 16:04:24 +08:00
committed by GitHub
parent a7f91079b8
commit a336543977
3 changed files with 4 additions and 4 deletions

View File

@@ -289,7 +289,7 @@ class AscendW8A8DynamicFusedMoEMethod:
]
layer.w13_weight_scale_fp32_list = [
weight.clone()
for weight in layer.w13_weight_scale.data.unbind(dim=0)
for weight in layer.w13_weight_scale_fp32.data.unbind(dim=0)
]
layer.w2_weight_scale_list = [
weight.clone()