sglang

Author	SHA1	Message	Date
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
HandH1998	0dd6cda288	Apply sgl w8a8 fp8 kernel (#3148 )	2025-03-09 00:03:32 -08:00
HAI	13bc39c5d6	ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152 )	2025-03-06 15:33:02 -08:00
HAI	51d25405a7	ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053 )	2025-03-04 03:00:46 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
HAI	5c54ef0352	AMD/ROCm: update AITER repo to ROCm/aiter (#3747 )	2025-02-21 00:18:08 -08:00
Ke Bao	c02e313914	Fix block wise fp8 torch compile (#3232 )	2025-01-31 19:56:02 +08:00
Lianmin Zheng	52c03f16b9	Add activation parameters to fused_moe (#3170 )	2025-01-27 00:23:37 -08:00
Yineng Zhang	5dc54f1a62	feat: remove vllm distributed (#2907 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2025-01-17 22:31:51 +08:00
Yineng Zhang	bf8d07a6f9	feat: patch linear base (#2915 )	2025-01-16 18:00:03 +08:00
kk	42f3909963	Unify sglang coding style (#2856 ) Co-authored-by: Lin, Soga <soga.lin@amd.com>	2025-01-13 02:12:44 -08:00
kk	e808c1df3e	Integrate ROCm ater package for ck moe function feasibility (#2854 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Lin, Soga <soga.lin@amd.com>	2025-01-13 08:23:07 +00:00
Lianmin Zheng	8a6906127a	Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784 ) Co-authored-by: SangBin Cho rkooo567@gmail.com	2025-01-07 23:29:10 -08:00
HAI	e6f523b5f2	fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655 )	2024-12-29 23:45:02 -08:00
HAI	30828e7192	AMD: set weights and scaling numbers properly for block FP8 (#2637 )	2024-12-29 03:23:39 -08:00
Xiaoyu Zhang	9254a33ad4	avoid fused_moe_triton `padding` circular import (#2624 )	2024-12-28 14:01:35 +08:00
HandH1998	53aed988cb	Refactor MoE (#2575 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-12-26 00:02:14 +08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
HAI	95f93f493a	Fp8 MoE optimizations on AMD (#2388 )	2024-12-07 21:18:26 +08:00
Yineng Zhang	d332aa3b0c	fix: resolve fp8 moe issue (#2387 )	2024-12-07 19:28:53 +08:00
Yineng Zhang	84d96b3ae5	Move FP8 to SGLang (#2370 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2024-12-06 15:42:10 +08:00
Lianmin Zheng	fb1f28cbbb	Clean up the comments and names under python/sglang/srt/layers (#1047 )	2024-08-12 05:54:37 +00:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Ying Sheng	2d96da813e	refactor model loader [unreachable code]: initial refactor (#655 )	2024-07-19 09:27:06 -07:00

25 Commits