[Quantization] register AscendQuantRMSNorm for quantization (#2856)

### What this PR does / why we need it? modelslim will generate self.bias for rms norm in quantization, since RMSNorm in vllm has no this parameter, so its nesscesary to create a AscendQuantRmsNorm. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by deepseek-v3.1-w8a8 <img width="2496" height="592" alt="image" src="https://github.com/user-attachments/assets/004c6e76-3d7a-4a1f-b59f-a14304012663" /> - vLLM version: main - vLLM main: d6249d0699 Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-09-11 23:14:02 +08:00
parent eab3635850
commit f5a97e8fa5
4 changed files with 35 additions and 7 deletions
--- a/vllm_ascend/worker/worker_v1.py
+++ b/vllm_ascend/worker/worker_v1.py
@@ -83,7 +83,7 @@ class NPUWorker(WorkerBase):
        from vllm_ascend import ops
        ops.register_dummy_fusion_op()
        _register_atb_extensions()
-        register_ascend_customop()
+        register_ascend_customop(vllm_config)
        # init ascend config and soc version
        init_ascend_config(vllm_config)
        init_ascend_soc_version()