sglang

Author	SHA1	Message	Date
narutolhy	d17986f8c6	Enable optional FP32 compute for LM Head (#10729 ) Thanks to MiniMax Team and Chenyang Zhao's support.	2025-09-29 20:45:17 -07:00
Lianmin Zheng	dda34c2f93	Fix mem fraction static for nightly tests (#11076 )	2025-09-29 12:57:41 -07:00
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
kushanam	d7b20dd65d	chore: Initial support for input config files (#10534 ) Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-24 14:45:52 -07:00
Lianmin Zheng	b1f0fc1c0b	Add CI timeout guidelines (#10829 )	2025-09-23 22:08:02 -07:00
Even Zhou	d27a6f7092	[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130 )	2025-09-22 17:17:48 -07:00
Adarsh Shirawalmath	592caab66a	[Docs, minor] Fix LLM doc matrix (#10753 )	2025-09-23 01:29:55 +08:00
Lifu Huang	08ecd0aa2a	[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592 )	2025-09-20 22:47:48 -07:00
Zaili Wang	6fd4816d9f	Fix sgl_kernel import failure on devices other than CUDA (#10610 )	2025-09-18 11:38:02 -07:00
Philip Kiely - Baseten	7f028b07c4	Fix formatting in long code blocks (#10528 )	2025-09-16 12:02:05 -07:00
Zaili Wang	925dbb3218	[CPU] fix CPU backend sel. issue for Llama4 (#10511 )	2025-09-16 02:57:45 -07:00
Lifu Huang	3f41b48c40	[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286 )	2025-09-15 16:04:03 -07:00
Praneth Paruchuri	a45d9a4ee8	model: support solar (#8189 )	2025-09-16 02:21:13 +08:00
Yineng Zhang	86a32bb5cd	chore: bump v0.5.3rc0 (#10468 )	2025-09-15 03:55:18 -07:00
Lianmin Zheng	50dc0c1e9c	Run tests based on labels (#10456 )	2025-09-15 00:29:20 -07:00
Vincent Zhong	0b14159fc4	Add reasoning examples for GPT-OSS in Markdown examples (#9626 )	2025-09-15 11:27:40 +08:00
Feng Su	4c21b09074	[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peng Wang <rocking@linux.alibaba.com>	2025-09-15 02:08:02 +08:00
Shu Wang	3df05f4d6a	[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199 )	2025-09-11 20:18:43 -07:00
Zaili Wang	7bc5fb0d78	[CPU][doc] add torch.compile param in example commands (#10349 )	2025-09-11 19:22:46 -07:00
Yineng Zhang	b0d25e72c4	chore: bump v0.5.2 (#10221 )	2025-09-11 16:09:20 -07:00
Yi Zhang	760b788a58	add qwen3-next doc (#10327 )	2025-09-11 14:29:11 -07:00
Zaili Wang	ef959d7b85	[CPU] fix OOM when mem-fraction is not set (#9090 )	2025-09-10 23:52:22 -07:00
Glen Liu	ebd0e1c18b	[doc] add walkthrough for implementing and hosting a simple llama wrapper m… (#10093 )	2025-09-10 12:05:06 +08:00
Shakhizat Nurgaliyev	2fe17735a6	Updated Nvidia Jetson docs (#4422 )	2025-09-09 11:41:21 +08:00
geray	ba066ca02f	Update link for EAGLE speculative decoding (#10191 )	2025-09-09 11:09:50 +08:00
Baizhou Zhang	8ad700f735	Cleaning codes for speculative attention mode (#10149 )	2025-09-08 17:38:06 -07:00
Teng Ma	a02071a12c	[Bench] feat: mooncake trace integration (#9839 ) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>	2025-09-09 02:50:54 +08:00
Yineng Zhang	b7d1f17b8d	Revert "enable auto-round quantization model (#6226 )" (#10148 )	2025-09-07 22:31:11 -07:00
Weiwei	c8295d2353	enable auto-round quantization model (#6226 ) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>	2025-09-07 22:05:35 -07:00
Cao E	7577f0e40f	Add graph runner support with torch compile on CPU (#7843 )	2025-09-07 21:33:58 -07:00
eigen	b0fcbb74d0	[DOC]: some minor updates (#10134 )	2025-09-07 14:58:15 -07:00
Liangsheng Yin	6e95f5e5bd	Simplify `Router` arguments passing and build it in docker image (#9964 )	2025-09-05 12:13:55 +08:00
Yineng Zhang	fa9c82d339	chore: bump v0.5.2rc2 (#10050 )	2025-09-04 20:07:27 -07:00
Yingchun Lai	b32ab0705e	metrics: support customer buckets for prompt/generation_tokens_histogram (#9634 )	2025-09-04 22:22:08 +08:00
Huapeng Zhou	75ee00112d	[Doc] Fix SGLang tool parser doc (#9886 )	2025-09-04 21:52:53 +08:00
Lianmin Zheng	60e37f8028	Move parsers under a single folder (#9912 )	2025-09-02 18:25:04 -07:00
Yineng Zhang	18f91eb639	chore: bump v0.5.2rc1 (#9920 )	2025-09-02 04:43:34 -07:00
Lifu Huang	1fbfdebe6b	[chore] fix dead links in doc (#9913 )	2025-09-02 00:28:26 -07:00
Yineng Zhang	16e56ea693	chore: bump v0.5.2rc0 (#9862 )	2025-09-01 03:07:36 -07:00
Zhiqiang Xie	001f51940a	[HiCache] change the default policy to write through (#9772 )	2025-08-28 18:28:39 -07:00
Yineng Zhang	bc80dc4ce0	chore: bump v0.5.1.post3 (#9716 )	2025-08-27 15:42:42 -07:00
yhyang201	a85363c199	[docs] Instructions for bench_serving.py (#9071 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-26 18:30:57 -07:00
Xiaotong Jiang	1a0896e9c0	[doc] add kimik2 --tool-call-parser (#9647 )	2025-08-26 10:39:40 -07:00
Netanel Haber	4cd08dc592	model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301 )	2025-08-26 15:33:40 +08:00
Chayenne	9b08d975a0	[docs] Refactor, remove compiled results and add gpt-oss (#9613 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-25 15:27:06 -07:00
Yineng Zhang	e3e97a120b	chore: bump v0.5.1.post2 (#9592 )	2025-08-25 03:45:09 -07:00
Xinyuan Tong	ca4b86c564	fix: Update OpenAI client base URL in documentation (#9576 )	2025-08-24 23:06:57 -07:00
Yineng Zhang	e0ab167db0	chore: bump v0.5.1.post1 (#9558 )	2025-08-24 01:14:17 -07:00
Xiaotong Jiang	80425e59bb	[doc] deepseekv31 support (#9544 )	2025-08-23 16:54:58 -07:00
Lianmin Zheng	97a38ee85b	Release 0.5.1 (#9533 )	2025-08-23 07:09:26 -07:00

1 2 3 4 5 ...

679 Commits