[Feature] support compressed-tensors w4a16 quantization (#154)

- native int4 kimi model inference is supported

Signed-off-by: Li Wei <liwei.109@outlook.com>
This commit is contained in:
Li Wei
2026-01-27 19:56:22 +08:00
committed by GitHub
parent 0711c1abfa
commit 71bd70ad6c
9 changed files with 369 additions and 28 deletions

View File

@@ -2275,7 +2275,7 @@ fwd_kvcache_mla.register_fake(_fake_fwd_kvcache_mla)
##################################################
# --------------- dequant_int4 -----------------
# --------------- dequant_int4 -------------------
##################################################
@custom_op("_C::dequant_int4", mutates_args=())
def dequant_int4(