From e3a2443c3a61f0f8376e75df4083a1554595af68 Mon Sep 17 00:00:00 2001 From: Wang Kunpeng <1289706727@qq.com> Date: Sun, 27 Jul 2025 08:47:51 +0800 Subject: [PATCH] [main][Doc] add mla pertoken quantization FAQ (#2018) ### What this PR does / why we need it? When using deepseek series models generated by the --dynamic parameter, if torchair graph mode is enabled, we should modify the configuration file in the CANN package to prevent incorrect inference results. - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/7728dd77bb802e1876012eb264df4d2fa2fc6f3c --------- Signed-off-by: Wang Kunpeng <1289706727@qq.com> --- .../user_guide/feature_guide/quantization.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/source/user_guide/feature_guide/quantization.md b/docs/source/user_guide/feature_guide/quantization.md index 1ffc0bf..abdf344 100644 --- a/docs/source/user_guide/feature_guide/quantization.md +++ b/docs/source/user_guide/feature_guide/quantization.md @@ -105,3 +105,21 @@ submit a issue, maybe some new models need to be adapted. ### 2. How to solve the error "Could not locate the configuration_deepseek.py"? Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error. + +### 3. When converting deepseek series models with modelslim, what should you pay attention? + +When using the weight generated by modelslim with the `--dynamic` parameter, if torchair graph mode is enabled, please modify the configuration file in the CANN package to prevent incorrect inference results. + +The operation steps are as follows: + +1. Search in the CANN package directory used, for example: +find /usr/local/Ascend/ -name fusion_config.json + +2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows: + +```bash +{ + "Switch":{ + "GraphFusion":{ + "AddRmsNormDynamicQuantFusionPass":"off", +```