feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762)

Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com>
This commit is contained in:
SijiaYang
2025-07-08 05:47:21 +08:00
committed by GitHub
parent 6a6e0bb7fd
commit cb9d91ea8a
10 changed files with 1006 additions and 9 deletions

View File

@@ -708,6 +708,7 @@ class ServerArgs:
"w8a8_fp8",
"moe_wna16",
"qoq",
"w4afp8",
],
help="The quantization method.",
)