[Feature] Modify description and api for ascend quantization (#243)
### What this PR does / why we need it? 1. It adds more description for classes in quant_config.py 2. It renames AscendQKVQuantAttentionMethod to AscendKVCacheMethod to align with vLLM naming style. 3. It modifies the process when AscendLinearMethod or AscendKVCacheMethod calls create_weights. ### Does this PR introduce _any_ user-facing change? Yes. When creating weights, now AscendLinearMethod uses get_weight, get_pertensor_param and get_perchannel_param api from linear quant implementation, while AscendKVCacheMethod passes layer into linear quant implementation. ### How was this patch tested? By performing offline inference --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>
This commit is contained in:
@@ -25,7 +25,7 @@ class AscendQuantizer:
|
||||
"""An interface to different quantization implementations for ascend hardwares."""
|
||||
|
||||
@classmethod
|
||||
def get_quantizer(cls, quant_config: Dict[str, Any]):
|
||||
def get_quantizer(cls, quant_config: Dict[str, Any], prefix: str):
|
||||
# TODO: Need a param to choose quantization algorithms.
|
||||
quantization_algorithm = ''
|
||||
|
||||
@@ -39,7 +39,7 @@ class AscendQuantizer:
|
||||
raise NotImplementedError(
|
||||
"There is no available ascend quantizer.")
|
||||
|
||||
return MindIETurboQuantizer.get_quantizer(quant_config)
|
||||
return MindIETurboQuantizer.get_quantizer(quant_config, prefix)
|
||||
|
||||
def build_linear_method(self):
|
||||
raise NotImplementedError
|
||||
|
||||
Reference in New Issue
Block a user