sglang

Author	SHA1	Message	Date
bjmsong	17de02f98d	Integration of TurboMind AWQ (#2828 ) Co-authored-by: root <bjmsong@126.com>	2025-01-13 20:14:16 +08:00
kk	42f3909963	Unify sglang coding style (#2856 ) Co-authored-by: Lin, Soga <soga.lin@amd.com>	2025-01-13 02:12:44 -08:00
Lianmin Zheng	72c7776355	Fix linear.py and improve weight loading (#2851 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-01-13 01:39:14 -08:00
kk	e808c1df3e	Integrate ROCm ater package for ck moe function feasibility (#2854 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Lin, Soga <soga.lin@amd.com>	2025-01-13 08:23:07 +00:00
Ke Bao	85b2e05770	Add int8 quant kernel (#2848 )	2025-01-13 13:16:58 +08:00
Ke Bao	b5fb4ef58a	Update modelopt config and fix running issue (#2792 )	2025-01-08 18:04:30 +08:00
Lianmin Zheng	8a6906127a	Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784 ) Co-authored-by: SangBin Cho rkooo567@gmail.com	2025-01-07 23:29:10 -08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Zhiyu	287427e2e6	Enable Nvidia's ModelOpt fp8 quantized models (#2535 )	2025-01-06 14:54:52 -08:00
HAI	e6f523b5f2	fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655 )	2024-12-29 23:45:02 -08:00
HandH1998	afa0341e57	Update Triton configs for block fp8 kernels (#2641 )	2024-12-29 22:53:47 +08:00
HAI	30828e7192	AMD: set weights and scaling numbers properly for block FP8 (#2637 )	2024-12-29 03:23:39 -08:00
Yineng Zhang	7863e4368a	add configs for block fp8 related kernels (#2628 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-28 23:12:04 +08:00
Xiaoyu Zhang	9254a33ad4	avoid fused_moe_triton `padding` circular import (#2624 )	2024-12-28 14:01:35 +08:00
Yineng Zhang	635a042623	docs: update deepseek v3 example (#2592 )	2024-12-26 17:43:37 +08:00
HandH1998	53aed988cb	Refactor MoE (#2575 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-12-26 00:02:14 +08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
HAI	95f93f493a	Fp8 MoE optimizations on AMD (#2388 )	2024-12-07 21:18:26 +08:00
Yineng Zhang	d332aa3b0c	fix: resolve fp8 moe issue (#2387 )	2024-12-07 19:28:53 +08:00
Yineng Zhang	84d96b3ae5	Move FP8 to SGLang (#2370 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2024-12-06 15:42:10 +08:00
HAI	b2986d7aa5	Adding SGLang FP8 Utils (#2348 )	2024-12-04 03:01:33 -08:00
Lianmin Zheng	1228f7ca69	Fix gptq for moe layers (#2300 ) Co-authored-by: root <me@zhyncs.com>	2024-12-03 23:12:33 +08:00
Yineng Zhang	55842eb81a	feat: fused_moe fp8 monkey patch (#2174 )	2024-11-25 17:06:36 +08:00
Lianmin Zheng	be0124bda0	Rename triton_fused_moe -> fused_moe_triton (#2163 )	2024-11-24 08:12:35 -08:00
Yineng Zhang	b509db5832	feat: remove the dependency on FusedMoE (#2153 )	2024-11-24 20:09:27 +08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Ke Bao	16eb33ffe2	Update vocab embedding deps and add TP switch (#1856 )	2024-10-31 20:13:07 -07:00
Jani Monoses	3ff641132e	Remove references to squeezellm (#1603 )	2024-10-07 11:30:41 -07:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00
Lianmin Zheng	fb1f28cbbb	Clean up the comments and names under python/sglang/srt/layers (#1047 )	2024-08-12 05:54:37 +00:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Ying Sheng	2d96da813e	refactor model loader [unreachable code]: initial refactor (#655 )	2024-07-19 09:27:06 -07:00

33 Commits