sglang

Author	SHA1	Message	Date
Lianmin Zheng	9eefe2c0b7	Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Cheng Wan <cwan@x.ai> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-10-17 17:30:06 -07:00
Lianmin Zheng	b9a54e0968	[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-17 14:25:22 -07:00
Keyang Ru	2bc3fcd420	[doc] update router document (#11767 )	2025-10-17 10:26:54 -07:00
sglang-bot	85ebeecf06	chore: bump SGLang version to 0.5.3.post3 (#11693 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-16 13:14:55 -07:00
Lianmin Zheng	cd7e1bd591	Sync code and test CI; rename some env vars (#11686 )	2025-10-15 18:37:03 -07:00
sglang-bot	baf277a9bf	chore: bump SGLang version to 0.5.3.post2 (#11680 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-15 16:49:14 -07:00
Fan Yin	5464457251	[sgl-kernel] Optimize gguf test (#11667 )	2025-10-15 15:45:53 -07:00
Yineng Zhang	ab9187a20b	docs: update sglang installation guide (#11659 )	2025-10-15 00:35:48 -07:00
b8zhong	6bc503af73	[Doc] Update support matrix for attn and hybrid attn (#11293 )	2025-10-14 22:43:11 -07:00
Xun Sun	a40229f6f8	[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423 ) Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-14 19:40:54 -07:00
Simo Lin	e0c2af2ac2	[router] update router doc to latest features (#11639 )	2025-10-14 18:32:30 -07:00
Lianmin Zheng	d314bf6010	Update install.md (#11631 )	2025-10-14 14:34:46 -07:00
Wenyi Xu	642fa966f2	[Docs] [Router]: Update sg-router doc on circuit breaker (#11449 )	2025-10-14 02:18:14 -07:00
Chenxi Li	28f80b1244	Implement LRU eviction policy for LoRA adapters (#11041 )	2025-10-13 20:18:25 -07:00
Xiaoyu Zhang	88a6f9dab5	bench_serving support PD Disaggregation (#11542 )	2025-10-13 19:43:26 -07:00
Neelabh Sinha	aaf7af1b17	[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413 )	2025-10-14 09:20:17 +08:00
Liangsheng Yin	acc2327bbd	Move deep gemm related arguments to `sglang.srt.environ` (#11547 )	2025-10-14 00:34:35 +08:00
hzh0425	318424e2c8	[HICache]: Support 3FS-Store with page_first_direct layout (#11460 )	2025-10-13 15:47:22 +08:00
Jonah Bernard	8e776c78a1	docs(router): add token-bucket rate limiting to the docs (#11485 )	2025-10-12 20:03:27 -07:00
Lianmin Zheng	2ac46e94ef	Sync changes on io_struct.py and deterministic ops (#11498 )	2025-10-12 16:03:10 -07:00
Glen Liu	47c606d3dc	[Feature] support regex strings as a stopping condition (#10635 )	2025-10-12 10:53:15 +08:00
ykcombat	f5754d1256	[Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427 )	2025-10-11 21:36:07 +08:00
Zaili Wang	f19613e6c3	Dedicated toml files for CPU/XPU (#10734 )	2025-10-10 00:44:55 -07:00
sglang-bot	758b887ad1	chore: bump SGLang version to 0.5.3.post1 (#11324 )	2025-10-09 15:19:59 -07:00
Netanel Haber	d6837aea4d	model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-10-09 00:37:38 +08:00
Kevin Xiang Li	e3bb7f5ae6	benchmark: enhance configurable multimodal benchmarking in bench_serving (#9812 ) Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-10-08 01:31:36 -07:00
Shangming Cai	0a7c4bded7	[Doc] Update mooncake nvlink transport doc for PD disaggregation (#11321 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-08 00:59:29 -07:00
Cheng Wan	3c06b673af	[8/N] MoE Refactor: deprecate `EPMoE` (#11211 )	2025-10-07 21:51:41 -07:00
Adarsh Shirawalmath	7c3f07dbcb	[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545 )	2025-10-08 12:38:48 +08:00
Xinyuan Tong	c4d77774e1	update sampling_params documentation with defaults (#11315 )	2025-10-07 18:36:26 -07:00
Xinyuan Tong	e3c7f09146	Update tool parser and related documentation (#11223 )	2025-10-07 11:03:40 -07:00
hzh0425	df08bf9b9f	[Doc]: Best Practice for HICache (#11001 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-08 00:59:21 +08:00
ykwd	69efdd27bc	[Doc] HiCache Design Documents (#11027 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-10-08 00:35:45 +08:00
Wenyi Xu	0958a39704	[Docs] [Router] Update Observability and Common Issues Section (#11302 )	2025-10-07 08:03:09 -07:00
Lianmin Zheng	708f4ff490	Rename max_micro_batch_size -> pp_max_micro_batch_size (#11279 )	2025-10-06 15:50:56 -07:00
sglang-bot	a4a3d82393	chore: bump SGLang version to 0.5.3 (#11263 )	2025-10-06 20:07:02 +08:00
sglang-bot	0b13cbb7c9	chore: bump SGLang version to 0.5.3rc2 (#11259 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-06 01:12:10 -07:00
Lianmin Zheng	d645ae90a3	Rename runner labels (#11228 )	2025-10-05 18:05:41 -07:00
Praneth Paruchuri	fad7ca73f8	model: support starcoder2 (#10609 )	2025-10-04 00:11:19 +08:00
Matt Nappo	8c57490210	[Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873 ) Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>	2025-10-03 16:48:19 +08:00
fzyzcjy	5e786cca3a	Support single batch overlap (#10422 )	2025-10-02 18:04:36 +08:00
Xinyuan Tong	a9ce2bcb3c	[Doc] Update multimodal language models documentation (#11111 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2025-09-30 22:10:31 -07:00
narutolhy	d17986f8c6	Enable optional FP32 compute for LM Head (#10729 ) Thanks to MiniMax Team and Chenyang Zhao's support.	2025-09-29 20:45:17 -07:00
Lianmin Zheng	dda34c2f93	Fix mem fraction static for nightly tests (#11076 )	2025-09-29 12:57:41 -07:00
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
kushanam	d7b20dd65d	chore: Initial support for input config files (#10534 ) Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-24 14:45:52 -07:00
Lianmin Zheng	b1f0fc1c0b	Add CI timeout guidelines (#10829 )	2025-09-23 22:08:02 -07:00
Even Zhou	d27a6f7092	[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130 )	2025-09-22 17:17:48 -07:00
Adarsh Shirawalmath	592caab66a	[Docs, minor] Fix LLM doc matrix (#10753 )	2025-09-23 01:29:55 +08:00
Lifu Huang	08ecd0aa2a	[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592 )	2025-09-20 22:47:48 -07:00

1 2 3 4 5 ...

721 Commits