sglang

Author	SHA1	Message	Date
Matt Nappo	8c57490210	[Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873 ) Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>	2025-10-03 16:48:19 +08:00
fzyzcjy	5e786cca3a	Support single batch overlap (#10422 )	2025-10-02 18:04:36 +08:00
narutolhy	d17986f8c6	Enable optional FP32 compute for LM Head (#10729 ) Thanks to MiniMax Team and Chenyang Zhao's support.	2025-09-29 20:45:17 -07:00
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
kushanam	d7b20dd65d	chore: Initial support for input config files (#10534 ) Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-24 14:45:52 -07:00
Philip Kiely - Baseten	7f028b07c4	Fix formatting in long code blocks (#10528 )	2025-09-16 12:02:05 -07:00
Baizhou Zhang	8ad700f735	Cleaning codes for speculative attention mode (#10149 )	2025-09-08 17:38:06 -07:00
Yingchun Lai	b32ab0705e	metrics: support customer buckets for prompt/generation_tokens_histogram (#9634 )	2025-09-04 22:22:08 +08:00
Zhiqiang Xie	001f51940a	[HiCache] change the default policy to write through (#9772 )	2025-08-28 18:28:39 -07:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Lianmin Zheng	2e8e7e353b	Improve docs and developer guide (#9044 )	2025-08-10 21:05:18 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00

13 Commits