[Misc] Disable quantization in mindie_turbo (#2147)
### What this PR does / why we need it?
cherry pick #1749 from v0.9.1-dev
since the interface in vllm-ascend has changed so quickly, the
quantization function in mindie_turbo is no longer needed, so it needs
to be discarded.
Co-authored-by: zouyida [zouyida@huawei.com](mailto:zouyida@huawei.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.10.0
- vLLM main:
207b750e19
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -47,14 +47,8 @@ class AscendQuantizer:
|
||||
if quantization_algorithm in CUSTOMIZED_QUANTIZER_TYPE:
|
||||
return
|
||||
|
||||
try:
|
||||
module = importlib.import_module("mindie_turbo")
|
||||
MindIETurboQuantizer = module.MindIETurboQuantizer
|
||||
return MindIETurboQuantizer.get_quantizer(quant_config, prefix,
|
||||
packed_modules_mapping)
|
||||
except ImportError:
|
||||
return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
|
||||
packed_modules_mapping)
|
||||
return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
|
||||
packed_modules_mapping)
|
||||
|
||||
def build_linear_method(self):
|
||||
raise NotImplementedError
|
||||
|
||||
Reference in New Issue
Block a user