Commit Graph

100 Commits

Author SHA1 Message Date
Chang Su
f04c80dc42 Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
2025-04-07 00:29:36 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
AniZpZ
d95269f9b3 [2/3] fix dsv3 awq issue (#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
2025-04-03 17:36:39 -07:00
Xiaoyu Zhang
e9c6ce461d sgl scaled_fp8_quant support output padding (#4861) 2025-04-02 23:53:57 +08:00
Lianmin Zheng
74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) 2025-03-28 10:34:10 -07:00
Jiaqi
72031173e4 fix: fix typo of comments in w8a8_fp8.py (#4843) 2025-03-27 21:06:47 -07:00
laixin
ae25d36dc6 [3/3] fix dsv3 awq issue (#4719)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
2025-03-26 23:13:43 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
Stefan He
4c584fc632 Fix circular imports in gptq.py and unblock test explorer (#4736) 2025-03-24 18:07:08 -07:00
Yun Dai
8cd4250401 [quantization] fix channelwise conversion with scalar weight scale (#4596) 2025-03-22 00:47:52 -07:00
lukec
4c56e5dbee Set deepgemm to the default value in the hopper architecture. (#4613) 2025-03-20 22:03:00 -07:00
Cheng Wan
7b5fc71972 fix SUPPORT_CUTLASS_BLOCK_FP8 flag (#4640) 2025-03-20 21:45:07 -07:00
strgrb
f9c53cbb42 Create col-major and tma-aligned x_scale for deep_gemm.gemm_fp8_fp8_bf16_nt (#4515)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-03-19 00:02:43 -07:00
Yineng Zhang
c16b33ccac cleanup deps 3/n (#4541) 2025-03-18 00:11:36 -07:00
Xiaoyu Zhang
dd865befde [Hotfix] solve fp8 w8a8 ci test fail (#4531) 2025-03-17 23:17:04 -07:00
Xiaoyu Zhang
9b81f9bd34 sglang quant module remove vllm dependency (#4507) 2025-03-17 15:51:59 -07:00
yiakwy-xpu-ml-framework-team
5f9b2c62ff [ROCm] fix dtype (#4510) 2025-03-17 05:20:50 -07:00
Stefan He
ef3c2dd08e Support Online Quantization for W8A8 (#4485) 2025-03-17 00:28:56 -07:00
Lianmin Zheng
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) 2025-03-12 23:45:52 -07:00
Meng, Hengyu
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
2025-03-12 22:26:29 -07:00
Lianmin Zheng
c76040e31b Support page size > 1 (#4356) 2025-03-12 22:22:39 -07:00
AniZpZ
85ef7f64e4 [FIX] fix incorrect output when enable both deepgemm and torch compile (#4359)
Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>
2025-03-12 21:34:09 -07:00
Yineng Zhang
d1da58e275 unify is_cuda and is_hip (#4321) 2025-03-11 18:12:56 -07:00
Ximingwang-09
0f2a2e3c19 Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
2025-03-11 12:32:33 -07:00
lukec
dce303e279 linear support deepgemm (#4199)
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-11 00:38:37 -07:00
lambert0312
d3ecd63204 Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136) 2025-03-11 00:32:25 -07:00
HandH1998
2ac189edc8 Amd test fp8 (#4261) 2025-03-10 10:12:09 -07:00
Lianmin Zheng
00d25a7f5e Fix quantization and nightly tests (#4258) 2025-03-10 03:06:21 -07:00
Lianmin Zheng
e8a69e4d0c Clean up fp8 support (#4230) 2025-03-09 21:46:35 -07:00
HandH1998
0dd6cda288 Apply sgl w8a8 fp8 kernel (#3148) 2025-03-09 00:03:32 -08:00
HandH1998
c7f254468f [Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: b0urnee <2769086541@qq.com>
2025-03-06 20:54:52 -08:00
HAI
13bc39c5d6 ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152) 2025-03-06 15:33:02 -08:00
yigex
5be8f1ed98 ROCM: AITER BLOCK GEMM (#4075) 2025-03-05 03:10:49 -08:00
Qubitium-ModelCloud
56a724eba3 [QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
2025-03-05 01:11:00 -08:00
HAI
51d25405a7 ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053) 2025-03-04 03:00:46 -08:00
Xihuai Wang
12f2e6c3f1 Fix: #3988 using blockwise_int8 (#4023) 2025-03-03 23:49:58 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
laixin
1a6e97577a Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-02-24 05:43:35 -08:00
Zhiyu
c66b2c9cf1 Add support for nvidia modelopt fp8 kv cache (#3223) 2025-02-22 07:04:58 +08:00
HAI
5c54ef0352 AMD/ROCm: update AITER repo to ROCm/aiter (#3747) 2025-02-21 00:18:08 -08:00
yizhang2077
1eb8eade2b add control for cutlass fp8 blockwise gemm (#3727) 2025-02-20 16:10:35 +08:00
Wen-Heng (Jack) Chung
2eab113206 [ROCm] Add additional block quant GEMM tuning configs for AMD GPUs. (#3616)
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-17 15:54:18 -08:00
Xiaoyu Zhang
c38f3aed24 support multi-gpu block-gemm tuning (#3639) 2025-02-18 00:00:35 +08:00
Yineng Zhang
5f1a485d9e Revert "[ROCm] Use tl.range() in block GEMM kernels with `num_stage… (#3632) 2025-02-17 18:01:21 +08:00
Wen-Heng (Jack) Chung
03caefeb51 [ROCm] Use tl.range() in block GEMM kernels with num_stages set by host. (#3535)
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-16 01:40:38 -08:00
Wen-Heng (Jack) Chung
871a4aa1bf [ROCm] Add ROCm tuning configs for AMD Instinct MI325X. (#3536) 2025-02-12 20:09:36 -08:00
yizhang2077
98eecbda54 integrate blockwise fp8 kernel (#3529) 2025-02-13 04:39:33 +08:00
Liangsheng Yin
8616357a97 Fix deepseek awq v3 (#3450) 2025-02-12 22:09:52 +08:00