[v0.18.0][Bugfix]Fix Error "AttributeError: 'AscendCompressedTensorsConfig' obiect has no attribute 'enabling_fa_quant'" (#7748)
### What this PR does / why we need it? cherry-pick from https://github.com/vllm-project/vllm-ascend/pull/7736 **Error information** When the quantized weights in CompressedTensors format of the kimi-k2 model are used, the following error is reported: `AttributeError: 'AscendCompressedTensorsConfig' obiect has no attribute 'enabling_fa_quant'` **Error Cause** Currently, FA3 quantization supports only the weights of modelslim quantization. The added methods are not defined in AscendCompressedTensorsConfig. **Solution** Before invoking related methods, check whether the FA3 feature is enabled. Additionally, the unused `get_scaled_act_names` method and its corresponding unit test have been removed. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests were updated by removing a deprecated test case, and the refactored logic was reviewed for correctness. Signed-off-by: Wang Kunpeng <1289706727@qq.com>
This commit is contained in:
@@ -197,3 +197,12 @@ def maybe_auto_detect_quantization(vllm_config) -> None:
|
||||
from vllm.config import VllmConfig as _VllmConfig
|
||||
|
||||
vllm_config.quant_config = _VllmConfig._get_quantization_config(model_config, vllm_config.load_config)
|
||||
|
||||
|
||||
def enable_fa_quant(vllm_config, layer_name=None) -> bool:
|
||||
if vllm_config.quant_config is not None and getattr(vllm_config.quant_config, "enable_fa_quant", False):
|
||||
if layer_name is not None:
|
||||
return vllm_config.quant_config.enabling_fa_quant(vllm_config, layer_name)
|
||||
else:
|
||||
return True
|
||||
return False
|
||||
|
||||
Reference in New Issue
Block a user