sglang

Author	SHA1	Message	Date
Baizhou Zhang	983ef22cf3	[Doc] Update deterministic inference flag in server_arguments.md (#11978 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-22 14:12:15 -07:00
Minglei Zhu	200a3c0bb1	[Documentation] add doc for deterministic inference (#11956 )	2025-10-22 12:36:15 -05:00
Zhiyu	80b2b3207a	Enable native ModelOpt quantization support (3/3) (#10154 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-10-21 21:44:29 -07:00
Baizhou Zhang	ef4a8097b8	Rename flashmla kernel options of nsa backend for better readability (#11876 )	2025-10-21 13:14:16 -07:00
Neelabh Sinha	852c0578fd	[FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570 )	2025-10-21 15:44:33 +08:00
Meng, Hengyu	b113c72e7a	Init attention backend for Intel XPU (#10656 ) Co-authored-by: guangyey <guangye.yu@intel.com> Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>	2025-10-21 11:41:28 +08:00
DarkSharpness	276e7b3e4e	[Feature] New structural tag support (#10691 )	2025-10-20 18:25:58 +08:00
Shangming Cai	a2ba0bc3df	Tiny clean up for PD module and doc (#11747 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-20 11:52:42 +08:00
Baizhou Zhang	44f0ece9fc	[Doc] Update documents for FA4 (#11778 )	2025-10-19 17:40:38 -07:00
b8zhong	f9a7d9b3dc	support server arg override KV cache to bf16 to avoid slow cases (#11749 )	2025-10-19 02:49:48 +08:00
Keyang Ru	2bc3fcd420	[doc] update router document (#11767 )	2025-10-17 10:26:54 -07:00
b8zhong	6bc503af73	[Doc] Update support matrix for attn and hybrid attn (#11293 )	2025-10-14 22:43:11 -07:00
Xun Sun	a40229f6f8	[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423 ) Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-14 19:40:54 -07:00
Simo Lin	e0c2af2ac2	[router] update router doc to latest features (#11639 )	2025-10-14 18:32:30 -07:00
Wenyi Xu	642fa966f2	[Docs] [Router]: Update sg-router doc on circuit breaker (#11449 )	2025-10-14 02:18:14 -07:00
Chenxi Li	28f80b1244	Implement LRU eviction policy for LoRA adapters (#11041 )	2025-10-13 20:18:25 -07:00
Xiaoyu Zhang	88a6f9dab5	bench_serving support PD Disaggregation (#11542 )	2025-10-13 19:43:26 -07:00
hzh0425	318424e2c8	[HICache]: Support 3FS-Store with page_first_direct layout (#11460 )	2025-10-13 15:47:22 +08:00
Jonah Bernard	8e776c78a1	docs(router): add token-bucket rate limiting to the docs (#11485 )	2025-10-12 20:03:27 -07:00
Lianmin Zheng	2ac46e94ef	Sync changes on io_struct.py and deterministic ops (#11498 )	2025-10-12 16:03:10 -07:00
ykcombat	f5754d1256	[Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427 )	2025-10-11 21:36:07 +08:00
Shangming Cai	0a7c4bded7	[Doc] Update mooncake nvlink transport doc for PD disaggregation (#11321 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-08 00:59:29 -07:00
Cheng Wan	3c06b673af	[8/N] MoE Refactor: deprecate `EPMoE` (#11211 )	2025-10-07 21:51:41 -07:00
Xinyuan Tong	e3c7f09146	Update tool parser and related documentation (#11223 )	2025-10-07 11:03:40 -07:00
hzh0425	df08bf9b9f	[Doc]: Best Practice for HICache (#11001 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-08 00:59:21 +08:00
ykwd	69efdd27bc	[Doc] HiCache Design Documents (#11027 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-08 00:35:45 +08:00
Wenyi Xu	0958a39704	[Docs] [Router] Update Observability and Common Issues Section (#11302 )	2025-10-07 08:03:09 -07:00
Lianmin Zheng	708f4ff490	Rename max_micro_batch_size -> pp_max_micro_batch_size (#11279 )	2025-10-06 15:50:56 -07:00
Matt Nappo	8c57490210	[Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873 ) Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>	2025-10-03 16:48:19 +08:00
fzyzcjy	5e786cca3a	Support single batch overlap (#10422 )	2025-10-02 18:04:36 +08:00
narutolhy	d17986f8c6	Enable optional FP32 compute for LM Head (#10729 ) Thanks to MiniMax Team and Chenyang Zhao's support.	2025-09-29 20:45:17 -07:00
Lianmin Zheng	dda34c2f93	Fix mem fraction static for nightly tests (#11076 )	2025-09-29 12:57:41 -07:00
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
kushanam	d7b20dd65d	chore: Initial support for input config files (#10534 ) Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-24 14:45:52 -07:00
Lifu Huang	08ecd0aa2a	[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592 )	2025-09-20 22:47:48 -07:00
Philip Kiely - Baseten	7f028b07c4	Fix formatting in long code blocks (#10528 )	2025-09-16 12:02:05 -07:00
Lifu Huang	3f41b48c40	[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286 )	2025-09-15 16:04:03 -07:00
Baizhou Zhang	8ad700f735	Cleaning codes for speculative attention mode (#10149 )	2025-09-08 17:38:06 -07:00
Yineng Zhang	b7d1f17b8d	Revert "enable auto-round quantization model (#6226 )" (#10148 )	2025-09-07 22:31:11 -07:00
Weiwei	c8295d2353	enable auto-round quantization model (#6226 ) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>	2025-09-07 22:05:35 -07:00
Liangsheng Yin	6e95f5e5bd	Simplify `Router` arguments passing and build it in docker image (#9964 )	2025-09-05 12:13:55 +08:00
Yingchun Lai	b32ab0705e	metrics: support customer buckets for prompt/generation_tokens_histogram (#9634 )	2025-09-04 22:22:08 +08:00
Huapeng Zhou	75ee00112d	[Doc] Fix SGLang tool parser doc (#9886 )	2025-09-04 21:52:53 +08:00
Lianmin Zheng	60e37f8028	Move parsers under a single folder (#9912 )	2025-09-02 18:25:04 -07:00
Lifu Huang	1fbfdebe6b	[chore] fix dead links in doc (#9913 )	2025-09-02 00:28:26 -07:00
Zhiqiang Xie	001f51940a	[HiCache] change the default policy to write through (#9772 )	2025-08-28 18:28:39 -07:00
yhyang201	a85363c199	[docs] Instructions for bench_serving.py (#9071 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-26 18:30:57 -07:00
Xiaotong Jiang	1a0896e9c0	[doc] add kimik2 --tool-call-parser (#9647 )	2025-08-26 10:39:40 -07:00
Chayenne	9b08d975a0	[docs] Refactor, remove compiled results and add gpt-oss (#9613 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-25 15:27:06 -07:00
Xinyuan Tong	13ec8d427e	[Docs]Update reasoning parser doc & fix outdated link (#9492 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 22:08:28 -07:00

1 2

65 Commits