sglang

Author	SHA1	Message	Date
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Baizhou Zhang	fee37d9e8d	[Doc]Fix description for dp_size argument (#6063 )	2025-05-08 00:04:22 +08:00
mlmz	a68ed76682	feat: append more comprehensive fields in messages instead of merely role and content (#5996 )	2025-05-05 11:43:34 -07:00
Wenxuan Tan	22da3d978f	Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555 )	2025-05-05 10:32:17 -07:00
Lifu Huang	1232f7e8b7	Update dev container config to support live code sync and improve docker setup guide (#6018 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-04 22:33:46 -07:00
vzed	95c231e50d	Tool Call: Add `chat_template_kwargs` documentation (#5679 )	2025-05-04 13:12:40 -07:00
Chayenne	73dcf2b326	Remove token in token out in Native API (#5967 )	2025-05-01 21:59:43 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
江家瑋	ad506a4e6b	docs: Fix Qwen model typo (#5944 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-05-01 10:23:00 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Yineng Zhang	9858113c33	chore: bump v0.4.6.post2 (#5939 )	2025-04-30 22:04:40 -07:00
liwenju0	8fefdd32c7	[Feature] add support kimi vl model (#5383 ) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-29 21:31:19 -07:00
Baizhou Zhang	799789afed	Bump Flashinfer to 0.2.5 (#5870 ) Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>	2025-04-29 19:50:57 -07:00
Chang Su	2b06484bd1	feat: support pythonic tool call and index in tool call streaming (#5725 )	2025-04-29 17:30:44 -07:00
simveit	ae523675e5	[Doc] Tables instead of bulletpoints for sampling doc (#5841 )	2025-04-29 13:49:39 -07:00
Adarsh Shirawalmath	5c08aa4958	[Docs] Update docs for Qwen3 and Qwen3MoE (#5836 )	2025-04-29 13:48:30 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Yineng Zhang	dcae1fb2cd	chore: bump v0.4.6.post1 (#5845 )	2025-04-28 12:57:08 -07:00
Lianmin Zheng	849c83a0c0	[CI] test chunked prefill more (#5798 )	2025-04-28 10:57:17 -07:00
Baizhou Zhang	f48b007c1d	[Doc] Recover history of server_arguments.md (#5851 )	2025-04-28 10:48:21 -07:00
Michael Yao	966eb90865	[Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276 )	2025-04-28 00:36:10 -07:00
Trevor Morris	84810da4ae	Add Cutlass MLA attention backend (#5390 )	2025-04-27 20:58:53 -07:00
Huapeng Zhou	86317c09e9	[Docs] update grafana setup guide in production metrics (#5643 ) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>	2025-04-27 15:36:33 -07:00
Baizhou Zhang	84022c0e56	Release v0.4.6 (#5795 )	2025-04-27 14:07:05 -07:00
Frankey_8080	a21ef36352	support for the DeepSeek model by enabling streaming response parsing (#5592 )	2025-04-26 18:59:31 -07:00
Lianmin Zheng	155890e4d1	[Minor] fix documentations (#5756 )	2025-04-26 17:48:43 -07:00
Lianmin Zheng	5641a09458	Revert "[Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct)" (#5754 )	2025-04-25 15:50:28 -07:00
Brayden Zhong	43fb95c2fa	[Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct) (#5078 ) Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>	2025-04-25 15:24:09 +08:00
Michael Yao	b5be56944b	[Doc] Fix a link to Weilin Zhao (#5706 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-25 02:02:27 +08:00
Michael Yao	7c99103f4c	[Doc] Fix two 404 links caused by sglang typo (#5667 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-23 23:21:55 +08:00
Baizhou Zhang	ce5412b62e	Turn on DeepGemm By Default and Update Doc (#5628 )	2025-04-22 16:10:08 -07:00
Michael Yao	92bb64bc86	[Doc] Fix a 404 link to llama-405b (#5615 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-21 20:39:37 -07:00
Yineng Zhang	b9c87e781d	chore: bump v0.4.5.post3 (#5611 )	2025-04-21 18:16:20 -07:00
Huapeng Zhou	57131dd955	[Feat.] Enable grafana to show metrics (#4718 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-21 00:43:42 -07:00
simveit	8de53da989	smaller and non gated models for docs (#5378 )	2025-04-20 17:38:25 -07:00
Yi Zhou	fac17acf08	add function call parser for DeepSeek V3 (#5224 )	2025-04-20 17:38:08 -07:00
Adarsh Shirawalmath	8b39274e34	[Feature] Prefill assistant response - add continue_final_message parameter (#4226 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-04-20 17:37:18 -07:00
lukec	417b44eba8	[Feat] upgrade pytorch2.6 (#5417 )	2025-04-20 16:06:34 -07:00
Baizhou Zhang	072b4d0398	Add document for LoRA serving (#5521 )	2025-04-20 14:37:57 -07:00
fzyzcjy	9c43477710	Super tiny fix typo (#5559 )	2025-04-20 14:21:18 -07:00
Lianmin Zheng	fbdc94ba59	Release v0.4.5.post2 (#5582 )	2025-04-20 14:12:37 -07:00
Baizhou Zhang	b54b5a96e4	[Doc]Add instruction for profiling with bench_one_batch (#5581 )	2025-04-20 14:05:36 -07:00
Yineng Zhang	0961feefca	feat: use flashinfer jit package (#5547 )	2025-04-19 00:28:39 -07:00
Yineng Zhang	a6f892e5d0	Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544 )	2025-04-18 16:50:21 -07:00
Wenxuan Tan	bfa3922451	Avoid computing lse in Ragged Prefill when there's no prefix. (#5476 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-04-18 01:13:57 -07:00
Michael Yao	a0fc5bc144	[docs] Fix several consistency issues in sampling_params.md (#5373 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-04-18 10:54:40 +08:00
mlmz	f13d65a7ea	Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503 )	2025-04-17 11:37:43 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
Didier Durand	92d1561b70	Update attention_backend.md: plural form (#5489 )	2025-04-17 01:42:40 -07:00

1 2 3 4 5 ...

472 Commits