[Quantization] register AscendQuantRMSNorm for quantization (#2856)

### What this PR does / why we need it? modelslim will generate self.bias for rms norm in quantization, since RMSNorm in vllm has no this parameter, so its nesscesary to create a AscendQuantRmsNorm. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by deepseek-v3.1-w8a8 <img width="2496" height="592" alt="image" src="https://github.com/user-attachments/assets/004c6e76-3d7a-4a1f-b59f-a14304012663" /> - vLLM version: main - vLLM main: d6249d0699 Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-09-11 23:14:02 +08:00
parent eab3635850
commit f5a97e8fa5
4 changed files with 35 additions and 7 deletions
--- a/vllm_ascend/quantization/utils.py
+++ b/vllm_ascend/quantization/utils.py
@@ -9,8 +9,6 @@ from .w8a8 import (AscendC8KVCacheMethod, AscendW8A8FusedMoEMethod,
 from .w8a8_dynamic import (AscendW8A8DynamicFusedMoEMethod,
                           AscendW8A8DynamicLinearMethod)

-patched = False
-
 ASCEND_QUANTIZATION_METHOD_MAP: Dict[str, Dict[str, Type[Any]]] = {
    "W4A8_DYNAMIC": {
        "linear": AscendW4A8DynamicLinearMethod,