sglang

Author	SHA1	Message	Date
Yineng Zhang	2272c2a5b5	chore: bump v0.4.9.post4 (#8305 )	2025-07-25 17:12:47 -07:00
Xiaoyu Zhang	9045cc1eb8	[torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering `torch.compile` in forward pass (#8353 )	2025-07-25 21:17:47 +08:00
Zaili Wang	15d2759174	[CPU] Add tutorial docs for SGL on CPU (#8000 )	2025-07-25 00:03:16 -07:00
Yineng Zhang	01c000043c	chore: bump v0.4.9.post3 (#8265 )	2025-07-22 15:55:48 -07:00
Yineng Zhang	22bd857cb5	docs: update README (#7985 )	2025-07-12 13:31:11 -07:00
Yineng Zhang	eb118d88c4	chore: bump v0.4.9.post2 (#7963 )	2025-07-11 21:11:20 -07:00
Yineng Zhang	066f4ec91f	chore: bump v0.4.9.post1 (#7882 )	2025-07-09 00:28:17 -07:00
Yineng Zhang	ec5f9c6269	chore: bump v0.4.9 (#7802 )	2025-07-05 17:40:29 -07:00
Yuchen Cheng	1e3e3add3d	fix(docs): fix the broken link in `docs/references/production_metrics.md` (#7741 ) Signed-off-by: rudeigerc <rudeigerc@gmail.com>	2025-07-03 23:46:07 -07:00
Yi Zhang	93b6785d78	add description for llama4 eagle3 (#7688 )	2025-07-01 01:19:19 -07:00
ybyang	7349717e4b	[doc] update lws doc for pd (#7318 )	2025-07-01 10:39:04 +08:00
tarinkk	eb6c2c1663	Hybrid kv cache for LLaMA4 (#6563 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-06-27 18:58:55 -07:00
Yineng Zhang	69183f8808	chore: bump v0.4.8.post1 (#7559 )	2025-06-26 02:21:12 -07:00
Yineng Zhang	7c3a12c000	chore: bump v0.4.8 (#7493 )	2025-06-23 23:14:22 -07:00
Yineng Zhang	f9dc9dd28b	chore: bump v0.4.7.post1 (#7248 )	2025-06-16 15:20:29 -07:00
Lianmin Zheng	90bd3e32d6	Improve perf tuning docs (#7071 )	2025-06-10 16:55:04 -07:00
Yineng Zhang	4f723edd3b	chore: bump v0.4.7 (#7038 )	2025-06-10 01:56:20 -07:00
Yueyang Pan	98c00a2df1	Fix torch profiler bugs for bench_offline_throughput.py (#6557 )	2025-06-09 20:33:41 +08:00
HAI	b819381fec	AITER backend extension and workload optimizations (#6838 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-06-05 23:00:18 -07:00
Baizhou Zhang	791b3bfabb	[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479 )	2025-05-28 16:03:43 -07:00
linzhuo	7a0bbe6a64	update toc for doc and dockerfile code style format (#6450 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-27 13:05:11 +08:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Brayden Zhong	3c32895cbe	[Llama4] Add docs note about enable multimodal (#6235 )	2025-05-13 10:05:47 +08:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
Brayden Zhong	12319a6787	[Docs] Add docs for `SGLANG_` and `SGL_` environment variables (#6206 )	2025-05-13 01:45:41 +08:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Huapeng Zhou	86317c09e9	[Docs] update grafana setup guide in production metrics (#5643 ) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>	2025-04-27 15:36:33 -07:00
Frankey_8080	a21ef36352	support for the DeepSeek model by enabling streaming response parsing (#5592 )	2025-04-26 18:59:31 -07:00
Lianmin Zheng	155890e4d1	[Minor] fix documentations (#5756 )	2025-04-26 17:48:43 -07:00
Baizhou Zhang	ce5412b62e	Turn on DeepGemm By Default and Update Doc (#5628 )	2025-04-22 16:10:08 -07:00
Huapeng Zhou	57131dd955	[Feat.] Enable grafana to show metrics (#4718 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-21 00:43:42 -07:00
Yi Zhou	fac17acf08	add function call parser for DeepSeek V3 (#5224 )	2025-04-20 17:38:08 -07:00
fzyzcjy	9c43477710	Super tiny fix typo (#5559 )	2025-04-20 14:21:18 -07:00
Baizhou Zhang	b54b5a96e4	[Doc]Add instruction for profiling with bench_one_batch (#5581 )	2025-04-20 14:05:36 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
Xiaoyu Zhang	06a1656e02	[doc] Update benchmark_and_profiling.md (#5449 )	2025-04-15 23:27:34 -07:00
Baizhou Zhang	a42736bbb8	Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113 )	2025-04-15 22:01:22 -07:00
Baizhou Zhang	f6772f1497	[Fix] Turn off DeepGEMM by default (#5263 )	2025-04-14 17:45:44 -07:00
Adarsh Shirawalmath	a0a9f6d64f	[Docs] Remove the older supported docs section (#5301 )	2025-04-11 11:30:18 -07:00
Kay Yan	f2b70afde0	docs: remove the use of Downward API for LWS_WORKER_INDEX (#5110 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-04-08 20:46:11 -07:00
Ke Bao	ade714a67f	Add Llama4 user guide (#5133 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-04-07 19:09:34 -07:00
Chang Su	f04c80dc42	Add Llama4 support (#5092 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-07 00:29:36 -07:00
Baizhou Zhang	efbae697b3	[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052 )	2025-04-05 01:23:02 -07:00
Lianmin Zheng	74885a848b	Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048 )	2025-04-03 13:30:56 -07:00

1 2 3 4

184 Commits