[main][quantization] Support deepseek w4a8 per-channel quantization (#3011)

### What this PR does / why we need it? 1.Support deepseek w4a8 per-channel quantization 2.The eager mode supports converting weights to the NZ format ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? #### How to get weights using Modelslim ##### Installation steps git clone https://gitcode.com/Ascend/msit.git cd msit/msmodelslim bash install.sh ##### Generate w4a8 per-channel weights cd /example/DeepSeek Command reference: msmodelslim/example/DeepSeek/README.md - vLLM version: v0.10.2 - vLLM main: f225ea7dd9 --------- Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2025-09-27 21:01:16 +08:00
parent e9359bd8fa
commit 859e861d92
6 changed files with 299 additions and 196 deletions
--- a/docs/source/user_guide/feature_guide/quantization.md
+++ b/docs/source/user_guide/feature_guide/quantization.md
@@ -108,18 +108,19 @@ Please convert DeepSeek series models using `br_release_MindStudio_8.1.RC2_TR5_2

 ### 3. When converting deepseek series models with modelslim, what should you pay attention?

-When using the weight generated by modelslim with the `--dynamic` parameter, if torchair graph mode is enabled, please modify the configuration file in the CANN package to prevent incorrect inference results.
+When the mla portion of the weights used `W8A8_DYNAMIC` quantization, if torchair graph mode is enabled, please modify the configuration file in the CANN package to prevent incorrect inference results.

 The operation steps are as follows:

 1. Search in the CANN package directory used, for example:
 find /usr/local/Ascend/ -name fusion_config.json

-2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows:
+2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` and `"MultiAddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows:

 ```bash
 {
    "Switch":{
        "GraphFusion":{
            "AddRmsNormDynamicQuantFusionPass":"off",
+            "MultiAddRmsNormDynamicQuantFusionPass":"off",
 ```