sglang

Author	SHA1	Message	Date
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
simveit	506e5de8fe	Improve supported models doc (#6430 )	2025-05-20 01:43:35 +08:00
applesaucethebun	6dc6b30637	Add missing model to doc (#6396 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-18 12:57:58 -07:00
Vincent Zhong	e9ef39d2e9	docs: Update the MD files (#6373 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-17 09:23:16 -07:00
Kiv Chen	64825b8395	model(vlm): mistral 3.1 (#5099 ) Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>	2025-05-16 18:36:18 -07:00
Yury Sulsky	f19a9204cd	Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-16 12:26:15 -07:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Brayden Zhong	9a91fa0ed1	docs: fix a bad redirect (#6300 )	2025-05-14 10:27:19 -07:00
Mick	cd7c8a8de6	doc: update developer guide regarding mllms (#6138 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-14 23:13:13 +08:00
Yineng Zhang	16267d4fa7	chore: bump v0.4.6.post4 (#6245 )	2025-05-13 01:57:51 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Brayden Zhong	3c32895cbe	[Llama4] Add docs note about enable multimodal (#6235 )	2025-05-13 10:05:47 +08:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
Brayden Zhong	12319a6787	[Docs] Add docs for `SGLANG_` and `SGL_` environment variables (#6206 )	2025-05-13 01:45:41 +08:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Cheng Wan	25c83fff6a	Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558 ) Co-authored-by: liusy58 <liusy58@linux.alibaba.com>	2025-05-11 23:36:29 -07:00
Lianmin Zheng	01bdbf7f80	Improve structured outputs: fix race condition, server crash, metrics and style (#6188 )	2025-05-11 08:36:16 -07:00
Adarsh Shirawalmath	94d42b6794	[Docs] minor Qwen3 and reasoning parser docs fix (#6032 )	2025-05-11 08:22:46 -07:00
mlmz	69276f619a	doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (#6199 )	2025-05-11 08:22:11 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Yineng Zhang	66fc63d6b1	Revert "feat: add thinking_budget (#6089 )" (#6181 )	2025-05-10 16:07:45 -07:00
Ximingwang-09	921e4a8185	[Docs]Delete duplicate content (#6146 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-05-10 15:02:15 -07:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
Yineng Zhang	678d8cc987	chore: bump v0.4.6.post3 (#6165 )	2025-05-09 15:38:47 -07:00
thyecust	63484f9fd6	feat: add thinking_budget (#6089 )	2025-05-09 08:22:09 -07:00
Zhu Chen	fa7d7fd9e5	[Feature] Add FlashAttention3 as a backend for VisionAttention (#5764 ) Co-authored-by: othame <chenzhu_912@zju.edu.cn> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-05-08 10:01:19 -07:00
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Baizhou Zhang	fee37d9e8d	[Doc]Fix description for dp_size argument (#6063 )	2025-05-08 00:04:22 +08:00
mlmz	a68ed76682	feat: append more comprehensive fields in messages instead of merely role and content (#5996 )	2025-05-05 11:43:34 -07:00
Wenxuan Tan	22da3d978f	Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555 )	2025-05-05 10:32:17 -07:00
Lifu Huang	1232f7e8b7	Update dev container config to support live code sync and improve docker setup guide (#6018 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-04 22:33:46 -07:00
vzed	95c231e50d	Tool Call: Add `chat_template_kwargs` documentation (#5679 )	2025-05-04 13:12:40 -07:00
Chayenne	73dcf2b326	Remove token in token out in Native API (#5967 )	2025-05-01 21:59:43 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
江家瑋	ad506a4e6b	docs: Fix Qwen model typo (#5944 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-05-01 10:23:00 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
liwenju0	8fefdd32c7	[Feature] add support kimi vl model (#5383 ) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-29 21:31:19 -07:00
Baizhou Zhang	799789afed	Bump Flashinfer to 0.2.5 (#5870 ) Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>	2025-04-29 19:50:57 -07:00
Chang Su	2b06484bd1	feat: support pythonic tool call and index in tool call streaming (#5725 )	2025-04-29 17:30:44 -07:00
simveit	ae523675e5	[Doc] Tables instead of bulletpoints for sampling doc (#5841 )	2025-04-29 13:49:39 -07:00
Adarsh Shirawalmath	5c08aa4958	[Docs] Update docs for Qwen3 and Qwen3MoE (#5836 )	2025-04-29 13:48:30 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Yineng Zhang	dcae1fb2cd	chore: bump v0.4.6.post1 (#5845 )	2025-04-28 12:57:08 -07:00
Lianmin Zheng	849c83a0c0	[CI] test chunked prefill more (#5798 )	2025-04-28 10:57:17 -07:00
Baizhou Zhang	f48b007c1d	[Doc] Recover history of server_arguments.md (#5851 )	2025-04-28 10:48:21 -07:00
Michael Yao	966eb90865	[Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276 )	2025-04-28 00:36:10 -07:00
Trevor Morris	84810da4ae	Add Cutlass MLA attention backend (#5390 )	2025-04-27 20:58:53 -07:00
Huapeng Zhou	86317c09e9	[Docs] update grafana setup guide in production metrics (#5643 ) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>	2025-04-27 15:36:33 -07:00
Baizhou Zhang	84022c0e56	Release v0.4.6 (#5795 )	2025-04-27 14:07:05 -07:00

1 2 3 4 5 ...

498 Commits