[Misc] Disable quantization in mindie_turbo (#2147)
### What this PR does / why we need it?
cherry pick #1749 from v0.9.1-dev
since the interface in vllm-ascend has changed so quickly, the
quantization function in mindie_turbo is no longer needed, so it needs
to be discarded.
Co-authored-by: zouyida [zouyida@huawei.com](mailto:zouyida@huawei.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.10.0
- vLLM main:
207b750e19
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -47,12 +47,6 @@ class AscendQuantizer:
|
|||||||
if quantization_algorithm in CUSTOMIZED_QUANTIZER_TYPE:
|
if quantization_algorithm in CUSTOMIZED_QUANTIZER_TYPE:
|
||||||
return
|
return
|
||||||
|
|
||||||
try:
|
|
||||||
module = importlib.import_module("mindie_turbo")
|
|
||||||
MindIETurboQuantizer = module.MindIETurboQuantizer
|
|
||||||
return MindIETurboQuantizer.get_quantizer(quant_config, prefix,
|
|
||||||
packed_modules_mapping)
|
|
||||||
except ImportError:
|
|
||||||
return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
|
return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
|
||||||
packed_modules_mapping)
|
packed_modules_mapping)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user