sglang

Author	SHA1	Message	Date
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Baizhou Zhang	f6772f1497	[Fix] Turn off DeepGEMM by default (#5263 )	2025-04-14 17:45:44 -07:00
JieXin Liang	bdde237562	[perf] experimental enhance fp8 per-tensor quant (#5370 )	2025-04-14 12:35:43 -07:00
Yineng Zhang	f58b929a51	chore: upgrade sgl-kernel 0.0.8.post3 (#5342 )	2025-04-13 00:45:59 -07:00
Yineng Zhang	611720919d	fix: use deepgemm only on hopper (#5310 )	2025-04-11 20:48:24 -07:00
HAI	8879944800	ROCm/AITER CK_MoE: update 2-stage kernels & support both Activations (#5228 )	2025-04-10 18:19:57 -07:00
HandH1998	4065248214	Support Llama4 fp8 inference (#5194 ) Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-09 20:14:34 +08:00
kk	92823069c4	Fix ci test "test_eval_fp8_accuracy" failed (#5185 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-04-09 02:44:05 -07:00
Yineng Zhang	6669d12707	feat: add DeepGEMM build warning (#5176 ) Co-authored-by: grimoire <streetyao@live.com>	2025-04-08 21:16:23 -07:00
Trevor Morris	11d760d56a	FP4 weight loading and inference (2/2) (#3972 )	2025-04-08 17:26:21 -07:00
Yun Dai	2695ab0537	Fix loading KV quantization scale; Enable modelopt kv cache (#4686 ) Co-authored-by: qingquansong <ustcsqq@gmail.com>	2025-04-08 09:11:35 -07:00
kk	88d6fd9a11	Fix torch compile errors (#5158 )	2025-04-08 15:04:37 +00:00
Yubo Wang	804d9f2e4c	Add unit test on page_size > 1 and mla and integration test for Flash Attention 3 (#4760 )	2025-04-07 23:20:51 -07:00
kk	5a144a8ab9	Fix run time error in ROCm platform (#5147 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: root <root@dell300x-pla-t10-17.pla.dcgpu>	2025-04-07 22:49:40 -07:00
Xiaoyu Zhang	db452760e5	[ci] fix llama4 ci error (#5126 )	2025-04-07 21:15:46 +08:00
HAI	819924748a	Fix refactor error - fp8.py (#5106 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-04-07 00:34:08 -07:00
Chang Su	f04c80dc42	Add Llama4 support (#5092 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-07 00:29:36 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
AniZpZ	d95269f9b3	[2/3] fix dsv3 awq issue (#4625 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>	2025-04-03 17:36:39 -07:00
Xiaoyu Zhang	e9c6ce461d	sgl scaled_fp8_quant support output padding (#4861 )	2025-04-02 23:53:57 +08:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
Jiaqi	72031173e4	fix: fix typo of comments in w8a8_fp8.py (#4843 )	2025-03-27 21:06:47 -07:00
laixin	ae25d36dc6	[3/3] fix dsv3 awq issue (#4719 ) Co-authored-by: AniZpZ <aniz1905@gmail.com>	2025-03-26 23:13:43 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00
Stefan He	4c584fc632	Fix circular imports in gptq.py and unblock test explorer (#4736 )	2025-03-24 18:07:08 -07:00
Yun Dai	8cd4250401	[quantization] fix channelwise conversion with scalar weight scale (#4596 )	2025-03-22 00:47:52 -07:00
lukec	4c56e5dbee	Set deepgemm to the default value in the hopper architecture. (#4613 )	2025-03-20 22:03:00 -07:00
Cheng Wan	7b5fc71972	fix SUPPORT_CUTLASS_BLOCK_FP8 flag (#4640 )	2025-03-20 21:45:07 -07:00
strgrb	f9c53cbb42	Create col-major and tma-aligned x_scale for deep_gemm.gemm_fp8_fp8_bf16_nt (#4515 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-03-19 00:02:43 -07:00
Yineng Zhang	c16b33ccac	cleanup deps 3/n (#4541 )	2025-03-18 00:11:36 -07:00
Xiaoyu Zhang	dd865befde	[Hotfix] solve fp8 w8a8 ci test fail (#4531 )	2025-03-17 23:17:04 -07:00
Xiaoyu Zhang	9b81f9bd34	sglang quant module remove vllm dependency (#4507 )	2025-03-17 15:51:59 -07:00
yiakwy-xpu-ml-framework-team	5f9b2c62ff	[ROCm] fix dtype (#4510 )	2025-03-17 05:20:50 -07:00
Stefan He	ef3c2dd08e	Support Online Quantization for W8A8 (#4485 )	2025-03-17 00:28:56 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
Lianmin Zheng	c76040e31b	Support page size > 1 (#4356 )	2025-03-12 22:22:39 -07:00
AniZpZ	85ef7f64e4	[FIX] fix incorrect output when enable both deepgemm and torch compile (#4359 ) Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>	2025-03-12 21:34:09 -07:00
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Ximingwang-09	0f2a2e3c19	Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-11 12:32:33 -07:00
lukec	dce303e279	linear support deepgemm (#4199 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-11 00:38:37 -07:00
lambert0312	d3ecd63204	Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136 )	2025-03-11 00:32:25 -07:00
HandH1998	2ac189edc8	Amd test fp8 (#4261 )	2025-03-10 10:12:09 -07:00
Lianmin Zheng	00d25a7f5e	Fix quantization and nightly tests (#4258 )	2025-03-10 03:06:21 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
HandH1998	0dd6cda288	Apply sgl w8a8 fp8 kernel (#3148 )	2025-03-09 00:03:32 -08:00
HandH1998	c7f254468f	[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: b0urnee <2769086541@qq.com>	2025-03-06 20:54:52 -08:00
HAI	13bc39c5d6	ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152 )	2025-03-06 15:33:02 -08:00
yigex	5be8f1ed98	ROCM: AITER BLOCK GEMM (#4075 )	2025-03-05 03:10:49 -08:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00

1 2 3

116 Commits