sglang

Author	SHA1	Message	Date
Shu Wang	3df05f4d6a	[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199 )	2025-09-11 20:18:43 -07:00
Zaili Wang	7bc5fb0d78	[CPU][doc] add torch.compile param in example commands (#10349 )	2025-09-11 19:22:46 -07:00
Yineng Zhang	b0d25e72c4	chore: bump v0.5.2 (#10221 )	2025-09-11 16:09:20 -07:00
Yi Zhang	760b788a58	add qwen3-next doc (#10327 )	2025-09-11 14:29:11 -07:00
Zaili Wang	ef959d7b85	[CPU] fix OOM when mem-fraction is not set (#9090 )	2025-09-10 23:52:22 -07:00
Glen Liu	ebd0e1c18b	[doc] add walkthrough for implementing and hosting a simple llama wrapper m… (#10093 )	2025-09-10 12:05:06 +08:00
Shakhizat Nurgaliyev	2fe17735a6	Updated Nvidia Jetson docs (#4422 )	2025-09-09 11:41:21 +08:00
geray	ba066ca02f	Update link for EAGLE speculative decoding (#10191 )	2025-09-09 11:09:50 +08:00
Baizhou Zhang	8ad700f735	Cleaning codes for speculative attention mode (#10149 )	2025-09-08 17:38:06 -07:00
Teng Ma	a02071a12c	[Bench] feat: mooncake trace integration (#9839 ) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>	2025-09-09 02:50:54 +08:00
Yineng Zhang	b7d1f17b8d	Revert "enable auto-round quantization model (#6226 )" (#10148 )	2025-09-07 22:31:11 -07:00
Weiwei	c8295d2353	enable auto-round quantization model (#6226 ) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>	2025-09-07 22:05:35 -07:00
Cao E	7577f0e40f	Add graph runner support with torch compile on CPU (#7843 )	2025-09-07 21:33:58 -07:00
eigen	b0fcbb74d0	[DOC]: some minor updates (#10134 )	2025-09-07 14:58:15 -07:00
Liangsheng Yin	6e95f5e5bd	Simplify `Router` arguments passing and build it in docker image (#9964 )	2025-09-05 12:13:55 +08:00
Yineng Zhang	fa9c82d339	chore: bump v0.5.2rc2 (#10050 )	2025-09-04 20:07:27 -07:00
Yingchun Lai	b32ab0705e	metrics: support customer buckets for prompt/generation_tokens_histogram (#9634 )	2025-09-04 22:22:08 +08:00
Huapeng Zhou	75ee00112d	[Doc] Fix SGLang tool parser doc (#9886 )	2025-09-04 21:52:53 +08:00
Lianmin Zheng	60e37f8028	Move parsers under a single folder (#9912 )	2025-09-02 18:25:04 -07:00
Yineng Zhang	18f91eb639	chore: bump v0.5.2rc1 (#9920 )	2025-09-02 04:43:34 -07:00
Lifu Huang	1fbfdebe6b	[chore] fix dead links in doc (#9913 )	2025-09-02 00:28:26 -07:00
Yineng Zhang	16e56ea693	chore: bump v0.5.2rc0 (#9862 )	2025-09-01 03:07:36 -07:00
Zhiqiang Xie	001f51940a	[HiCache] change the default policy to write through (#9772 )	2025-08-28 18:28:39 -07:00
Yineng Zhang	bc80dc4ce0	chore: bump v0.5.1.post3 (#9716 )	2025-08-27 15:42:42 -07:00
yhyang201	a85363c199	[docs] Instructions for bench_serving.py (#9071 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-08-26 18:30:57 -07:00
Xiaotong Jiang	1a0896e9c0	[doc] add kimik2 --tool-call-parser (#9647 )	2025-08-26 10:39:40 -07:00
Netanel Haber	4cd08dc592	model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301 )	2025-08-26 15:33:40 +08:00
Chayenne	9b08d975a0	[docs] Refactor, remove compiled results and add gpt-oss (#9613 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-25 15:27:06 -07:00
Yineng Zhang	e3e97a120b	chore: bump v0.5.1.post2 (#9592 )	2025-08-25 03:45:09 -07:00
Xinyuan Tong	ca4b86c564	fix: Update OpenAI client base URL in documentation (#9576 )	2025-08-24 23:06:57 -07:00
Yineng Zhang	e0ab167db0	chore: bump v0.5.1.post1 (#9558 )	2025-08-24 01:14:17 -07:00
Xiaotong Jiang	80425e59bb	[doc] deepseekv31 support (#9544 )	2025-08-23 16:54:58 -07:00
Lianmin Zheng	97a38ee85b	Release 0.5.1 (#9533 )	2025-08-23 07:09:26 -07:00
Xinyuan Tong	fedfe91c1a	[Docs] Add doc and quick demo for gpt-oss responses api & buildin tools (#9497 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 23:51:52 -07:00
Xinyuan Tong	13ec8d427e	[Docs]Update reasoning parser doc & fix outdated link (#9492 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 22:08:28 -07:00
Chayenne	05bd789791	[docs]: fix reasoning context in docs (#9483 )	2025-08-21 20:04:12 -07:00
Xinyuan Tong	0b3a5b1151	Update reasoning parser doc (#9468 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 17:25:30 -07:00
Xinyuan Tong	e8449ab515	Add deepseek v3.1 thinking parser support and update docs (#9464 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-21 15:09:40 -07:00
Lifu Huang	b0980af89f	Support pinning adapter via server args. (#9249 )	2025-08-20 16:25:01 -07:00
Lianmin Zheng	1ec9769753	[Docs] Update contribution guide (#9383 )	2025-08-19 23:37:45 -07:00
Lianmin Zheng	f20b6a3f2b	[minor] Sync style changes (#9376 )	2025-08-19 21:35:01 -07:00
Lianmin Zheng	ecc9f3e47a	[Minor] Fix the style of sgl-kernel (#9332 )	2025-08-18 23:45:00 -07:00
Yineng Zhang	7e8187e004	docs: fix spec (#9326 )	2025-08-18 19:35:46 -07:00
Lianmin Zheng	c480a3f6ea	Minor style fixes for sgl-kernel (#9289 )	2025-08-18 09:38:35 -07:00
Netanel Haber	845d12a979	model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067 ) Co-authored-by: Kyle Huang <kylhuang@nvidia.com>	2025-08-17 01:48:15 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Yineng Zhang	fab0f6e77d	chore: bump v0.5.0rc2 (#9203 )	2025-08-14 16:11:16 -07:00
Chengxing Xie	c1c7dc4534	feat: Add model version tracking with API endpoints and response metadata (#8795 )	2025-08-14 12:13:46 -07:00
Lianmin Zheng	9e426466af	Clean up allocators (#9134 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-13 13:56:04 -07:00
Zhihao Liu	65736dc524	[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model (#7957 )	2025-08-13 11:14:54 -07:00

1 2 3 4 5 ...

662 Commits