[v0.18.0][Bugfix]Fix Error "AttributeError: 'AscendCompressedTensorsConfig' obiect has no attribute 'enabling_fa_quant'" (#7748)

### What this PR does / why we need it? cherry-pick from https://github.com/vllm-project/vllm-ascend/pull/7736 **Error information** When the quantized weights in CompressedTensors format of the kimi-k2 model are used, the following error is reported: `AttributeError: 'AscendCompressedTensorsConfig' obiect has no attribute 'enabling_fa_quant'` **Error Cause** Currently, FA3 quantization supports only the weights of modelslim quantization. The added methods are not defined in AscendCompressedTensorsConfig. **Solution** Before invoking related methods, check whether the FA3 feature is enabled. Additionally, the unused `get_scaled_act_names` method and its corresponding unit test have been removed. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests were updated by removing a deprecated test case, and the refactored logic was reviewed for correctness. Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2026-03-28 17:03:56 +08:00
parent c1cefd26de
commit 5df2ddd8db
5 changed files with 14 additions and 12 deletions
--- a/vllm_ascend/attention/mla_v1.py
+++ b/vllm_ascend/attention/mla_v1.py
@@ -49,6 +49,7 @@ from vllm_ascend.ops.layer_shard_linear import (
 )
 from vllm_ascend.ops.rotary_embedding import get_cos_and_sin_mla
 from vllm_ascend.quantization.methods.w8a8_static import AscendW8A8LinearMethod
+from vllm_ascend.quantization.utils import enable_fa_quant
 from vllm_ascend.utils import ACL_FORMAT_FRACTAL_ND, get_weight_prefetch_method, maybe_trans_nz, weak_ref_tensors
 from vllm_ascend.worker.npu_input_batch import NPUInputBatch

@@ -730,10 +731,7 @@ class AscendMLAImpl(MLAAttentionImpl):
            self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.is_kv_producer
        )
        self.layer_name = kwargs.get("layer_name")
-        quant_config = self.vllm_config.quant_config
-        self.fa_quant_layer = (
-            quant_config.enabling_fa_quant(self.vllm_config, self.layer_name) if quant_config is not None else False
-        )
+        self.fa_quant_layer = enable_fa_quant(self.vllm_config, self.layer_name)
        self.dtype = torch.int8 if self.fa_quant_layer else self.vllm_config.model_config.dtype
        self.layer_sharding_kwargs = []
        for layer_name in get_ascend_config().layer_sharding or []: