[main][Doc] add mla pertoken quantization FAQ (#2018)
### What this PR does / why we need it?
When using deepseek series models generated by the --dynamic parameter,
if torchair graph mode is enabled, we should modify the configuration
file in the CANN package to prevent incorrect inference results.
- vLLM version: v0.10.0
- vLLM main:
7728dd77bb
---------
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
This commit is contained in:
@@ -105,3 +105,21 @@ submit a issue, maybe some new models need to be adapted.
|
|||||||
### 2. How to solve the error "Could not locate the configuration_deepseek.py"?
|
### 2. How to solve the error "Could not locate the configuration_deepseek.py"?
|
||||||
|
|
||||||
Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error.
|
Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error.
|
||||||
|
|
||||||
|
### 3. When converting deepseek series models with modelslim, what should you pay attention?
|
||||||
|
|
||||||
|
When using the weight generated by modelslim with the `--dynamic` parameter, if torchair graph mode is enabled, please modify the configuration file in the CANN package to prevent incorrect inference results.
|
||||||
|
|
||||||
|
The operation steps are as follows:
|
||||||
|
|
||||||
|
1. Search in the CANN package directory used, for example:
|
||||||
|
find /usr/local/Ascend/ -name fusion_config.json
|
||||||
|
|
||||||
|
2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
{
|
||||||
|
"Switch":{
|
||||||
|
"GraphFusion":{
|
||||||
|
"AddRmsNormDynamicQuantFusionPass":"off",
|
||||||
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user