sglang

Author	SHA1	Message	Date
narutolhy	d17986f8c6	Enable optional FP32 compute for LM Head (#10729 ) Thanks to MiniMax Team and Chenyang Zhao's support.	2025-09-29 20:45:17 -07:00
Lianmin Zheng	dda34c2f93	Fix mem fraction static for nightly tests (#11076 )	2025-09-29 12:57:41 -07:00
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
kushanam	d7b20dd65d	chore: Initial support for input config files (#10534 ) Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-24 14:45:52 -07:00
Lifu Huang	08ecd0aa2a	[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592 )	2025-09-20 22:47:48 -07:00
Philip Kiely - Baseten	7f028b07c4	Fix formatting in long code blocks (#10528 )	2025-09-16 12:02:05 -07:00
Lifu Huang	3f41b48c40	[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286 )	2025-09-15 16:04:03 -07:00
Baizhou Zhang	8ad700f735	Cleaning codes for speculative attention mode (#10149 )	2025-09-08 17:38:06 -07:00
Yineng Zhang	b7d1f17b8d	Revert "enable auto-round quantization model (#6226 )" (#10148 )	2025-09-07 22:31:11 -07:00
Weiwei	c8295d2353	enable auto-round quantization model (#6226 ) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>	2025-09-07 22:05:35 -07:00
Liangsheng Yin	6e95f5e5bd	Simplify `Router` arguments passing and build it in docker image (#9964 )	2025-09-05 12:13:55 +08:00
Yingchun Lai	b32ab0705e	metrics: support customer buckets for prompt/generation_tokens_histogram (#9634 )	2025-09-04 22:22:08 +08:00
Huapeng Zhou	75ee00112d	[Doc] Fix SGLang tool parser doc (#9886 )	2025-09-04 21:52:53 +08:00
Lianmin Zheng	60e37f8028	Move parsers under a single folder (#9912 )	2025-09-02 18:25:04 -07:00
Lifu Huang	1fbfdebe6b	[chore] fix dead links in doc (#9913 )	2025-09-02 00:28:26 -07:00
Zhiqiang Xie	001f51940a	[HiCache] change the default policy to write through (#9772 )	2025-08-28 18:28:39 -07:00
yhyang201	a85363c199	[docs] Instructions for bench_serving.py (#9071 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-26 18:30:57 -07:00
Xiaotong Jiang	1a0896e9c0	[doc] add kimik2 --tool-call-parser (#9647 )	2025-08-26 10:39:40 -07:00
Chayenne	9b08d975a0	[docs] Refactor, remove compiled results and add gpt-oss (#9613 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-25 15:27:06 -07:00
Xinyuan Tong	13ec8d427e	[Docs]Update reasoning parser doc & fix outdated link (#9492 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 22:08:28 -07:00
Chayenne	05bd789791	[docs]: fix reasoning context in docs (#9483 )	2025-08-21 20:04:12 -07:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Yineng Zhang	7e8187e004	docs: fix spec (#9326 )	2025-08-18 19:35:46 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
jacky.cheng	25caa7a8a9	[AMD] Support Wave attention backend with AMD GPU optimizations (#8660 ) Signed-off-by: Stanley Winata <stanley.winata@amd.com> Signed-off-by: Harsh Menon <harsh@nod-labs.com> Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Signed-off-by: xintin <gaurav.verma@amd.com> Co-authored-by: Harsh Menon <harsh@nod-labs.com> Co-authored-by: Stanley Winata <stanley.winata@amd.com> Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com> Co-authored-by: Stanley Winata <stanley@nod-labs.com> Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com> Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com> Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com> Co-authored-by: Ivan Butygin <ibutygin@amd.com>	2025-08-12 13:49:11 -07:00
Simo Lin	1ce30dd13e	[router] update router documentation (#9121 )	2025-08-12 13:16:34 -07:00
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Faraz	f508cd3cb7	TRTLLM-MLA FP8 path (#8638 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-08-11 14:02:13 -07:00
Lianmin Zheng	8c07fabda7	Update hyperparameter_tuning.md (#9083 )	2025-08-11 13:44:11 -07:00
Liangsheng Yin	f9afa7dceb	Fix docs for clip max new tokens (#9082 )	2025-08-11 13:15:21 -07:00
Jimmy	0d9e89ec69	[PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866 )	2025-08-11 13:08:11 -07:00
Hangzhi	3d64fda376	Fix broken Kimi models HuggingFace link (#9080 )	2025-08-11 12:15:00 -07:00
Baizhou Zhang	75e6a7cde1	Support radix cache for Lora feature (#7216 )	2025-08-11 10:14:11 -07:00
Lianmin Zheng	2e8e7e353b	Improve docs and developer guide (#9044 )	2025-08-10 21:05:18 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00

35 Commits