[Feature] Modify description and api for ascend quantization (#243)

### What this PR does / why we need it?
1. It adds more description for classes in quant_config.py
2. It renames AscendQKVQuantAttentionMethod to AscendKVCacheMethod to
align with vLLM naming style.
3. It modifies the process when AscendLinearMethod or
AscendKVCacheMethod calls create_weights.


### Does this PR introduce _any_ user-facing change?
Yes. When creating weights, now AscendLinearMethod uses get_weight,
get_pertensor_param and get_perchannel_param api from linear quant
implementation, while AscendKVCacheMethod passes layer into linear quant
implementation.

### How was this patch tested?
By performing offline inference

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
This commit is contained in:
Angazenn
2025-03-06 15:17:25 +08:00
committed by GitHub
parent cff08f9df8
commit 3217f0d10f
2 changed files with 52 additions and 72 deletions

View File

@@ -25,7 +25,7 @@ class AscendQuantizer:
"""An interface to different quantization implementations for ascend hardwares."""
@classmethod
def get_quantizer(cls, quant_config: Dict[str, Any]):
def get_quantizer(cls, quant_config: Dict[str, Any], prefix: str):
# TODO: Need a param to choose quantization algorithms.
quantization_algorithm = ''
@@ -39,7 +39,7 @@ class AscendQuantizer:
raise NotImplementedError(
"There is no available ascend quantizer.")
return MindIETurboQuantizer.get_quantizer(quant_config)
return MindIETurboQuantizer.get_quantizer(quant_config, prefix)
def build_linear_method(self):
raise NotImplementedError