enable online serving quantization (#877)
For online serving, "ascend" quantization method is not a choice natively, so we need to add "ascend" quantization method to quantization methods list and the user can enable quantization using "vllm serve --quantization ascend" command. --------- Signed-off-by: 22dimensions <waitingwind@foxmail.com>
This commit is contained in:
@@ -38,6 +38,8 @@ else:
|
||||
# Maximum number of graphs that can be captured by ACL Graph
|
||||
MAX_CAPTURE_SIZE = 1920
|
||||
|
||||
ASCEND_QUATIZATION_METHOD = "ascend"
|
||||
|
||||
|
||||
def try_register_lib(lib_name: str, lib_info: str = ""):
|
||||
import importlib
|
||||
|
||||
Reference in New Issue
Block a user