[Misc] Disable quantization in mindie_turbo (#2147)

### What this PR does / why we need it?
cherry pick #1749 from v0.9.1-dev
since the interface in vllm-ascend has changed so quickly, the
quantization function in mindie_turbo is no longer needed, so it needs
to be discarded.

Co-authored-by: zouyida [zouyida@huawei.com](mailto:zouyida@huawei.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.0
- vLLM main:
207b750e19

Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
Li Wang
2025-08-01 08:53:00 +08:00
committed by GitHub
parent c62f346f5d
commit e3b3ffb875

View File

@@ -47,14 +47,8 @@ class AscendQuantizer:
if quantization_algorithm in CUSTOMIZED_QUANTIZER_TYPE:
return
try:
module = importlib.import_module("mindie_turbo")
MindIETurboQuantizer = module.MindIETurboQuantizer
return MindIETurboQuantizer.get_quantizer(quant_config, prefix,
packed_modules_mapping)
except ImportError:
return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
packed_modules_mapping)
return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
packed_modules_mapping)
def build_linear_method(self):
raise NotImplementedError