enable online serving quantization (#877)

For online serving, "ascend" quantization method is not a choice
natively, so we need to add "ascend" quantization method to quantization
methods list and the user can enable quantization using "vllm serve
--quantization ascend" command.

---------

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
This commit is contained in:
22dimensions
2025-05-17 17:36:04 +08:00
committed by GitHub
parent a8730e7a3c
commit 00e0243561
3 changed files with 17 additions and 5 deletions

View File

@@ -38,6 +38,8 @@ else:
# Maximum number of graphs that can be captured by ACL Graph
MAX_CAPTURE_SIZE = 1920
ASCEND_QUATIZATION_METHOD = "ascend"
def try_register_lib(lib_name: str, lib_info: str = ""):
import importlib