sglang

Author	SHA1	Message	Date
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	5dc54f1a62	feat: remove vllm distributed (#2907 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2025-01-17 22:31:51 +08:00
Lianmin Zheng	72c7776355	Fix linear.py and improve weight loading (#2851 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-01-13 01:39:14 -08:00
Lianmin Zheng	8a6906127a	Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784 ) Co-authored-by: SangBin Cho rkooo567@gmail.com	2025-01-07 23:29:10 -08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Ke Bao	16eb33ffe2	Update vocab embedding deps and add TP switch (#1856 )	2024-10-31 20:13:07 -07:00

12 Commits