[Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with norm bias (#3205)
### What this PR does / why we need it? 1. qwen3 moe uses add_rms_norm_quant op instead of 'add_rms_norm op and quant op' during quantization scene. 2. torch_npu.add_rms_norm_quant op fixed accuracy while model weights is quantized by anti_method m4, m4 quantization is asymmetric outlier suppression method, it will generate none-zero norm bias, add_rms_norm_quant op updated to add this parameter to calculate. ### Does this PR introduce _any_ user-facing change? please use a torch_npu version >= torch_npu-2.7.1.dev20250919 ### How was this patch tested? 1. no special parameters to set, no new envs to set. 2. use qwen3 moe quantization model to test ,such as Qwen3-235B-A22B-W8A8, Qwen3-30B-A3B-W8A8, Qwen3-235B-A22B-Instruct-2507-m4 (anti_method m4) - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: huangdong2022 <huangdong51@huawei.com> Signed-off-by: h30027576 <huangdong51@huawei.com>
This commit is contained in:
@@ -505,8 +505,7 @@ def register_ascend_customop(vllm_config: Optional[VllmConfig] = None):
|
||||
from vllm_ascend.ops.activation import AscendQuickGELU, AscendSiluAndMul
|
||||
from vllm_ascend.ops.common_fused_moe import (AscendFusedMoE,
|
||||
AscendSharedFusedMoE)
|
||||
from vllm_ascend.ops.layernorm import (AscendGemmaRMSNorm,
|
||||
AscendQuantRMSNorm, AscendRMSNorm)
|
||||
from vllm_ascend.ops.layernorm import AscendGemmaRMSNorm, AscendRMSNorm
|
||||
from vllm_ascend.ops.linear import (AscendColumnParallelLinear,
|
||||
AscendMergedColumnParallelLinear,
|
||||
AscendQKVParallelLinear,
|
||||
@@ -537,11 +536,6 @@ def register_ascend_customop(vllm_config: Optional[VllmConfig] = None):
|
||||
"MultiHeadLatentAttention": AscendMultiHeadLatentAttention,
|
||||
}
|
||||
|
||||
if vllm_config is not None and \
|
||||
vllm_config.quant_config is not None and \
|
||||
any("norm.bias" in name for name in vllm_config.quant_config.quant_description.keys()):
|
||||
REGISTERED_ASCEND_OPS["RMSNorm"] = AscendQuantRMSNorm
|
||||
|
||||
for name, op_cls in REGISTERED_ASCEND_OPS.items():
|
||||
CustomOp.register_oot(_decorated_op_cls=op_cls, name=name)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user