sglang

Author	SHA1	Message	Date
Shenggui Li	c6a4852136	[docs] added torch.compile cache to dpsk manual (#3737 )	2025-02-21 00:11:40 -08:00
Baizhou Zhang	ac05310098	[Docs] Modify ep related server args and remove cublas part of deepseek (#3732 )	2025-02-21 03:37:56 +08:00
Chayenne	3c7bfd7eab	Docs: Fix layout with sub-section (#3710 )	2025-02-19 15:44:30 -08:00
Shi Shuai	55de40f782	[Docs]: Fix Multi-User Port Allocation Conflicts (#3601 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com>	2025-02-19 11:15:44 -08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
ybyang	c51dc2cc8d	Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624 )	2025-02-17 18:14:20 -08:00
Yineng Zhang	a5375adc3a	chore: bump v0.4.3.post2 (#3645 ) Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-18 02:48:30 +08:00
Yineng Zhang	75d171a9c5	chore: update flashinfer v0.2.1.post2 (#3644 )	2025-02-18 02:47:42 +08:00
Yineng Zhang	e782eb7e6a	chore: bump v0.4.3.post1 (#3638 )	2025-02-17 21:58:19 +08:00
Shenggui Li	c9565e49e7	[docker] added rdma support (#3619 )	2025-02-17 15:36:16 +08:00
Shi Shuai	d03c4c25a7	[docs] Update sampling_params.md (#3617 )	2025-02-16 18:52:30 -08:00
simveit	8f13377dea	Draft of updated doc for sampling params. (#3260 ) Co-authored-by: shuaills <shishuaicareer@gmail.com>	2025-02-16 14:28:22 -08:00
Mick	bcc213df61	Model: Support Qwen 2.5 vl (#3258 )	2025-02-16 00:58:53 -08:00
Shenggui Li	231c40d859	[docs] added favicon to sphinx html (#3564 )	2025-02-15 10:21:21 -08:00
Mick	7711ac6ed0	doc: emphasize and notify the usage of chat_template (#3589 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-15 00:10:32 -08:00
Shi Shuai	7443197a63	[CI] Improve Docs CI Efficiency (#3587 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-14 19:57:00 -08:00
Yineng Zhang	4e23c961e8	docs: update install (#3581 )	2025-02-14 18:54:50 +08:00
Yineng Zhang	31eec35ba8	fix doc (#3558 )	2025-02-14 10:11:31 +08:00
Yineng Zhang	ac963be234	update flashinfer-python (#3557 )	2025-02-14 09:52:56 +08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00
simveit	368de3661e	Update install docs (#3553 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-13 13:42:51 -08:00
Jhin	bf2a70872e	Update DeepSeek V3 Doc (#3541 )	2025-02-12 23:15:37 -08:00
Zachary Streeter	8adbc78b30	added llama and cleaned up (#3503 )	2025-02-12 18:48:30 +08:00
Mick	ced680663c	doc: Support a new vLM (#3405 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-12 00:43:14 -08:00
Yineng Zhang	2f48221033	docs: update install	2025-02-12 03:13:31 +08:00
Zachary Streeter	2491cc928d	add deepseek-v3 amd docker command (#3495 )	2025-02-12 03:03:08 +08:00
Didier Durand	67c5de9286	fix router typo (#3496 )	2025-02-12 03:00:57 +08:00
Didier Durand	1e2cf2b541	fix server_arguments typo (#3499 )	2025-02-12 02:59:53 +08:00
Didier Durand	9490d15772	fix supported_models Qwen typo (#3498 )	2025-02-12 02:59:18 +08:00
Didier Durand	eefcbdd353	fix deepseek_v3 typo (#3497 )	2025-02-12 02:58:36 +08:00
Jackmin801	5f0e7de339	[Feat] Return hidden states (experimental) (#3364 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-10 15:54:37 -08:00
Yineng Zhang	cddb1cdf8f	chore: bump v0.4.2.post4 (#3459 )	2025-02-10 14:12:16 +08:00
Yineng Zhang	27c4c9cf52	remove _grouped_size_compiled_for_decode_kernels (#3453 )	2025-02-10 13:01:21 +08:00
Ying Sheng	52a492a16e	Update contribution_guide.md (#3452 )	2025-02-10 12:53:47 +08:00
Shi Shuai	20cf910d8f	[docs] Update quantization documentation (#3437 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: jamessand <shazhizhou0@gmail.com>	2025-02-09 10:39:49 -08:00
Wenxuan Tan	0af1d239cb	[Docs] Add quantization docs (#3410 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-02-10 02:16:21 +08:00
Yineng Zhang	4d2dbeaca7	remove cutex dependency (#3422 )	2025-02-09 18:33:20 +08:00
Shi Shuai	6702592d0e	[docs] Add multi-node inference example for SLURM in documentation (#3408 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: aflah02 <aflah20082@iiitd.ac.in>	2025-02-08 21:45:14 -08:00
Zachary Streeter	0a6f18f068	added amd_configure.md to references (#3275 ) Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-07 08:50:49 -08:00
Yineng Zhang	c1f5f99f60	chore: bump v0.4.2.post3 (#3369 )	2025-02-07 08:20:03 -08:00
Shi Shuai	591e751e07	Fix: Runtime error for function calling (#3300 )	2025-02-06 20:52:01 -08:00
Chayenne	76ca91dff2	Docs/CI: Enable Fake Finish for Docs Only PR (#3350 )	2025-02-06 19:33:31 -08:00
Yineng Zhang	7aad8d1854	chore: bump v0.4.2.post2 (#3313 )	2025-02-05 17:35:02 +08:00
Yineng Zhang	6186a8f889	update flashinfer install index url (#3293 )	2025-02-05 00:44:35 +08:00
HAI	2c1a695ff1	ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287 )	2025-02-04 21:44:44 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
simveit	7b5a374114	Update server args doc (#3273 ) Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>	2025-02-03 23:39:41 +00:00
Liangjun Song	455bfe8dd3	Add a Doc about guide on nvidia jetson #3182 (#3205 ) Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-02 20:29:10 -08:00
HAI	566d61d90f	ROCm: bump 6.3.0 (#3259 )	2025-02-03 04:13:40 +08:00
Chayenne	55f5fc68ac	Docs: Update accuracy evaluation (#3261 )	2025-02-02 11:14:59 -08:00

1 2 3 4 5 ...

278 Commits