sglang

Author	SHA1	Message	Date
xianzhiT	9f1787fa60	Support multi-thread model weight loading (#7277 )	2025-06-24 10:39:10 -07:00
Yuhong Guo	e5afb88b1c	Support weight loading without mmap (#7469 )	2025-06-23 15:13:59 -07:00
Charles Chen	8c16da334e	Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. (#7164 )	2025-06-17 11:26:23 -07:00
HandH1998	4065248214	Support Llama4 fp8 inference (#5194 ) Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com>	2025-04-09 20:14:34 +08:00
DangKai	cc88d98ab8	fix empty_cache error in pt_weights_iterator (#5151 ) Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>	2025-04-08 01:22:10 -07:00
Brayden Zhong	e84f4ba0ab	[Misc] Fix issues reported by torchfix (#4837 )	2025-03-27 20:10:32 -07:00
Juwan Yoo	188105a21b	deps: lazy import optional dependencies `gguf` and `torchvision` (#4826 )	2025-03-27 14:35:36 -07:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Zhiyu	c66b2c9cf1	Add support for nvidia modelopt fp8 kv cache (#3223 )	2025-02-22 07:04:58 +08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
lukec	b8ab989ff4	Fix the FP8 E4M3 parsing offline scales failure bug (#3045 )	2025-01-22 14:19:33 -08:00
bjmsong	d3024f4fc8	support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894 ) Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-18 11:43:22 +08:00
Yineng Zhang	5dc54f1a62	feat: remove vllm distributed (#2907 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2025-01-17 22:31:51 +08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00

16 Commits