[3/N][refactor] refactoer quantization (#2504)
### What this PR does / why we need it? Move torchair related qunatization section into torchair dir to make the code clear. Next step we'll remove all torchair related code outside of torchair quantization. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? vLLM version: main vLLM main:ab9f2cfd19- vLLM version: v0.10.1.1 - vLLM main:959783fb99Signed-off-by: hust17yixuan <303660421@qq.com>
This commit is contained in:
@@ -38,6 +38,7 @@ from vllm_ascend.torchair.utils import (TorchairCommonAttentionMetadata,
|
||||
check_torchair_cache_exist,
|
||||
converting_weight_acl_format,
|
||||
register_torchair_model,
|
||||
torchair_quant_method_register,
|
||||
write_kv_cache_bytes_to_file)
|
||||
from vllm_ascend.utils import (ACL_FORMAT_FRACTAL_ND, ACL_FORMAT_FRACTAL_NZ,
|
||||
is_310p)
|
||||
@@ -67,6 +68,7 @@ class NPUTorchairModelRunner(NPUModelRunner):
|
||||
|
||||
self._check_batch_sizes_consistency()
|
||||
register_torchair_model()
|
||||
torchair_quant_method_register()
|
||||
|
||||
def _get_forward_metadata_across_dp_and_pad(
|
||||
self, num_tokens: int, with_prefill: bool, enable_dbo: bool
|
||||
|
||||
Reference in New Issue
Block a user