sglang

Author	SHA1	Message	Date
Baizhou Zhang	791b3bfabb	[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479 )	2025-05-28 16:03:43 -07:00
linzhuo	7a0bbe6a64	update toc for doc and dockerfile code style format (#6450 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-27 13:05:11 +08:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Brayden Zhong	3c32895cbe	[Llama4] Add docs note about enable multimodal (#6235 )	2025-05-13 10:05:47 +08:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
Brayden Zhong	12319a6787	[Docs] Add docs for `SGLANG_` and `SGL_` environment variables (#6206 )	2025-05-13 01:45:41 +08:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Huapeng Zhou	86317c09e9	[Docs] update grafana setup guide in production metrics (#5643 ) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>	2025-04-27 15:36:33 -07:00
Frankey_8080	a21ef36352	support for the DeepSeek model by enabling streaming response parsing (#5592 )	2025-04-26 18:59:31 -07:00
Lianmin Zheng	155890e4d1	[Minor] fix documentations (#5756 )	2025-04-26 17:48:43 -07:00
Baizhou Zhang	ce5412b62e	Turn on DeepGemm By Default and Update Doc (#5628 )	2025-04-22 16:10:08 -07:00
Huapeng Zhou	57131dd955	[Feat.] Enable grafana to show metrics (#4718 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-21 00:43:42 -07:00
Yi Zhou	fac17acf08	add function call parser for DeepSeek V3 (#5224 )	2025-04-20 17:38:08 -07:00
fzyzcjy	9c43477710	Super tiny fix typo (#5559 )	2025-04-20 14:21:18 -07:00
Baizhou Zhang	b54b5a96e4	[Doc]Add instruction for profiling with bench_one_batch (#5581 )	2025-04-20 14:05:36 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
Xiaoyu Zhang	06a1656e02	[doc] Update benchmark_and_profiling.md (#5449 )	2025-04-15 23:27:34 -07:00
Baizhou Zhang	a42736bbb8	Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113 )	2025-04-15 22:01:22 -07:00
Baizhou Zhang	f6772f1497	[Fix] Turn off DeepGEMM by default (#5263 )	2025-04-14 17:45:44 -07:00
Adarsh Shirawalmath	a0a9f6d64f	[Docs] Remove the older supported docs section (#5301 )	2025-04-11 11:30:18 -07:00
Kay Yan	f2b70afde0	docs: remove the use of Downward API for LWS_WORKER_INDEX (#5110 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-04-08 20:46:11 -07:00
Ke Bao	ade714a67f	Add Llama4 user guide (#5133 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-04-07 19:09:34 -07:00
Chang Su	f04c80dc42	Add Llama4 support (#5092 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-07 00:29:36 -07:00
Baizhou Zhang	efbae697b3	[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052 )	2025-04-05 01:23:02 -07:00
Lianmin Zheng	74885a848b	Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048 )	2025-04-03 13:30:56 -07:00
Baizhou Zhang	e8999b13b7	Replace enable_flashinfer_mla argument with attention_backend (#5005 )	2025-04-03 02:53:58 -07:00
fzyzcjy	736502d4fd	Tiny fix doc error (#4795 )	2025-03-29 08:22:17 -07:00
Ke Bao	b39532587b	Update doc for DeepSeek-V3-0324 (#4825 )	2025-03-27 13:30:40 -07:00
Pan Lyu	c913ed4046	support clip embedding model (#4506 )	2025-03-27 00:18:15 -07:00
Didier Durand	44f47d3ee1	Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628 )	2025-03-27 00:16:16 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Ximingwang-09	22c3702e1e	[Model] Support Qwen2ForSequenceClassification (#4609 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-24 19:13:44 -07:00
Adarsh Shirawalmath	fb8886037c	[Docs] Update docs for gemma3 and VLM chat templates (#4674 )	2025-03-22 08:02:19 -07:00
Michael Yao	c6ec70290f	[docs] Add links and fix grammars in deploy_on_k8s.md (#4641 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-20 22:55:23 -07:00
Ke Bao	bfb03c6182	Update doc for MTP and DP attention (#4622 )	2025-03-20 11:31:48 -07:00
Albert	2d0045125f	Fix the incorrect args in benchmark_and_profiling.md (#4542 ) Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>	2025-03-18 00:07:06 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
江家瑋	26c372c13c	docs: Add Llama 3.3 to supported models (#4453 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-03-15 16:33:43 -07:00
Zhan Lu	660305c38a	[Doc] fix wrong flag in deepseek documentation (#4427 )	2025-03-14 11:30:55 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
Michael Yao	8f1f614ee2	[Docs] Clean up benchmark_and_profiling.md (#4297 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-11 21:48:21 -07:00
Ke Bao	3a08f54638	Update MTP doc (#4290 )	2025-03-11 00:46:55 -07:00

1 2 3 4

165 Commits