[Feature] Modify description and api for ascend quantization (#243)

### What this PR does / why we need it? 1. It adds more description for classes in quant_config.py 2. It renames AscendQKVQuantAttentionMethod to AscendKVCacheMethod to align with vLLM naming style. 3. It modifies the process when AscendLinearMethod or AscendKVCacheMethod calls create_weights. ### Does this PR introduce _any_ user-facing change? Yes. When creating weights, now AscendLinearMethod uses get_weight, get_pertensor_param and get_perchannel_param api from linear quant implementation, while AscendKVCacheMethod passes layer into linear quant implementation. ### How was this patch tested? By performing offline inference --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-03-06 15:17:25 +08:00
parent cff08f9df8
commit 3217f0d10f
2 changed files with 52 additions and 72 deletions
--- a/vllm_ascend/quantization/quantizer.py
+++ b/vllm_ascend/quantization/quantizer.py
@@ -25,7 +25,7 @@ class AscendQuantizer:
    """An interface to different quantization implementations for ascend hardwares."""

    @classmethod
-    def get_quantizer(cls, quant_config: Dict[str, Any]):
+    def get_quantizer(cls, quant_config: Dict[str, Any], prefix: str):
        # TODO: Need a param to choose quantization algorithms.
        quantization_algorithm = ''

@@ -39,7 +39,7 @@ class AscendQuantizer:
            raise NotImplementedError(
                "There is no available ascend quantizer.")

-        return MindIETurboQuantizer.get_quantizer(quant_config)
+        return MindIETurboQuantizer.get_quantizer(quant_config, prefix)

    def build_linear_method(self):
        raise NotImplementedError