sglang

Author	SHA1	Message	Date
Shu Wang	3df05f4d6a	[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199 )	2025-09-11 20:18:43 -07:00
Liangsheng Yin	6e95f5e5bd	Simplify `Router` arguments passing and build it in docker image (#9964 )	2025-09-05 12:13:55 +08:00
Huapeng Zhou	75ee00112d	[Doc] Fix SGLang tool parser doc (#9886 )	2025-09-04 21:52:53 +08:00
Lifu Huang	1fbfdebe6b	[chore] fix dead links in doc (#9913 )	2025-09-02 00:28:26 -07:00
Liangsheng Yin	f9afa7dceb	Fix docs for clip max new tokens (#9082 )	2025-08-11 13:15:21 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00
Yineng Zhang	9020f7fc32	chore: bump v0.5.0rc0 (#8959 )	2025-08-08 09:16:18 -07:00
Yineng Zhang	8cd344586e	chore: bump v0.4.10.post2 (#8727 )	2025-08-03 03:43:29 -07:00
Cheng Wan	6c88f6c8d9	[5/N] MoE Refactor: Update MoE parallelism arguments (#8658 )	2025-08-01 01:20:03 -07:00
Ke Bao	33f0de337d	chore: bump v0.4.10.post1 (#8652 )	2025-08-01 12:07:30 +08:00
Faraz	4b04998d38	TRTLLM Gen MLA Decode Kernel Integration (same as #7938 ) (#8632 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-07-31 16:03:40 -07:00
Yineng Zhang	023288645b	chore: bump v0.4.10 (#8608 )	2025-07-31 20:50:17 +08:00
Yineng Zhang	6478831be9	chore: bump v0.4.9.post6 (#8517 )	2025-07-29 02:30:07 -07:00
Yineng Zhang	45bc170b36	chore: bump v0.4.9.post5 (#8458 )	2025-07-28 02:11:06 -07:00
Qiaolin Yu	484d0e021d	doc: add bench_one_batch_server in the benchmark doc (#8441 )	2025-07-27 23:07:54 -07:00
Yineng Zhang	2272c2a5b5	chore: bump v0.4.9.post4 (#8305 )	2025-07-25 17:12:47 -07:00
Xiaoyu Zhang	9045cc1eb8	[torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering `torch.compile` in forward pass (#8353 )	2025-07-25 21:17:47 +08:00
Zaili Wang	15d2759174	[CPU] Add tutorial docs for SGL on CPU (#8000 )	2025-07-25 00:03:16 -07:00
Yineng Zhang	01c000043c	chore: bump v0.4.9.post3 (#8265 )	2025-07-22 15:55:48 -07:00
Yineng Zhang	22bd857cb5	docs: update README (#7985 )	2025-07-12 13:31:11 -07:00
Yineng Zhang	eb118d88c4	chore: bump v0.4.9.post2 (#7963 )	2025-07-11 21:11:20 -07:00
Yineng Zhang	066f4ec91f	chore: bump v0.4.9.post1 (#7882 )	2025-07-09 00:28:17 -07:00
Yineng Zhang	ec5f9c6269	chore: bump v0.4.9 (#7802 )	2025-07-05 17:40:29 -07:00
Yuchen Cheng	1e3e3add3d	fix(docs): fix the broken link in `docs/references/production_metrics.md` (#7741 ) Signed-off-by: rudeigerc <rudeigerc@gmail.com>	2025-07-03 23:46:07 -07:00
Yi Zhang	93b6785d78	add description for llama4 eagle3 (#7688 )	2025-07-01 01:19:19 -07:00
ybyang	7349717e4b	[doc] update lws doc for pd (#7318 )	2025-07-01 10:39:04 +08:00
tarinkk	eb6c2c1663	Hybrid kv cache for LLaMA4 (#6563 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-06-27 18:58:55 -07:00
Yineng Zhang	69183f8808	chore: bump v0.4.8.post1 (#7559 )	2025-06-26 02:21:12 -07:00
Yineng Zhang	7c3a12c000	chore: bump v0.4.8 (#7493 )	2025-06-23 23:14:22 -07:00
Yineng Zhang	f9dc9dd28b	chore: bump v0.4.7.post1 (#7248 )	2025-06-16 15:20:29 -07:00
Lianmin Zheng	90bd3e32d6	Improve perf tuning docs (#7071 )	2025-06-10 16:55:04 -07:00
Yineng Zhang	4f723edd3b	chore: bump v0.4.7 (#7038 )	2025-06-10 01:56:20 -07:00
Yueyang Pan	98c00a2df1	Fix torch profiler bugs for bench_offline_throughput.py (#6557 )	2025-06-09 20:33:41 +08:00
HAI	b819381fec	AITER backend extension and workload optimizations (#6838 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-06-05 23:00:18 -07:00
Baizhou Zhang	791b3bfabb	[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479 )	2025-05-28 16:03:43 -07:00
linzhuo	7a0bbe6a64	update toc for doc and dockerfile code style format (#6450 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-27 13:05:11 +08:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Brayden Zhong	3c32895cbe	[Llama4] Add docs note about enable multimodal (#6235 )	2025-05-13 10:05:47 +08:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
Brayden Zhong	12319a6787	[Docs] Add docs for `SGLANG_` and `SGL_` environment variables (#6206 )	2025-05-13 01:45:41 +08:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Chang Su	170d1f218a	feat: Refactor DeepSeekV3 function call (#5908 )	2025-05-01 21:28:57 -07:00
Ke Bao	ebaba85655	Update ci test and doc for MTP api change (#5952 )	2025-05-01 09:30:27 -07:00
Huapeng Zhou	86317c09e9	[Docs] update grafana setup guide in production metrics (#5643 ) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com>	2025-04-27 15:36:33 -07:00
Frankey_8080	a21ef36352	support for the DeepSeek model by enabling streaming response parsing (#5592 )	2025-04-26 18:59:31 -07:00
Lianmin Zheng	155890e4d1	[Minor] fix documentations (#5756 )	2025-04-26 17:48:43 -07:00
Baizhou Zhang	ce5412b62e	Turn on DeepGemm By Default and Update Doc (#5628 )	2025-04-22 16:10:08 -07:00

1 2 3 4

199 Commits