Drop torchair (#4814)

aclgraph is stable and fast now. Let's drop torchair graph mode now.

TODO: some logic to adapt torchair should be cleaned up as well. We'll
do it in the following PR.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
wangxiyuan
2025-12-10 09:20:40 +08:00
committed by GitHub
parent ba9cda9dfd
commit 835b4c8f1d
84 changed files with 77 additions and 16881 deletions

View File

@@ -104,22 +104,3 @@ First, make sure you specify `ascend` as the quantization method. Second, check
### 2. How to solve the error "Could not locate the configuration_deepseek.py"?
Please convert DeepSeek series models using `br_release_MindStudio_8.1.RC2_TR5_20260624` ModelSlim, where the missing configuration_deepseek.py error has been fixed.
### 3. What should be considered when converting DeepSeek series models with ModelSlim?
When the MLA portion of the weights used the `W8A8_DYNAMIC` quantization with the torchair graph mode enabled, modify the configuration file in the CANN package to prevent incorrect inference results.
The operation steps are as follows:
1. Search in the CANN package directory, for example:
find /usr/local/Ascend/ -name fusion_config.json
2. Add `"AddRmsNormDynamicQuantFusionPass":"off",` and `"MultiAddRmsNormDynamicQuantFusionPass":"off",` to the fusion_config.json you find, the location is as follows:
```bash
{
"Switch":{
"GraphFusion":{
"AddRmsNormDynamicQuantFusionPass":"off",
"MultiAddRmsNormDynamicQuantFusionPass":"off",
```