sglang

Author	SHA1	Message	Date
lukec	ffa1b3e318	Add an example of using deepseekv3 int8 sglang. (#4177 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-07 01:56:09 -08:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
Chayenne	ebddb65aed	Docs: add torch compile cache (#4151 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 14:27:09 -08:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
samzong	d2d0d061d9	fix cross-reference error and spelling mistakes (#4101 ) Signed-off-by: samzong <samzong.lu@gmail.com>	2025-03-05 16:39:02 -08:00
Yineng Zhang	0aaccbbfec	revert deepseek docs (#4109 )	2025-03-05 13:23:11 -08:00
Chayenne	e70fa279bc	Docs: reorganize dpsk docs (#4108 )	2025-03-05 13:01:03 -08:00
Tommy Yang	abe74b7b59	Docs: Add DeepSeek optimization ablations documentation (#4107 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 12:25:51 -08:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
Xihuai Wang	95575aa76a	Reasoning parser (#4000 ) Co-authored-by: Lucas Pickup <lupickup@microsoft.com>	2025-03-03 21:16:36 -08:00
Chayenne	146ac8df07	Add examples in sampling parameters (#4039 )	2025-03-03 13:04:32 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Yudi Xue	a7000a7650	Update metrics documentation (#3264 )	2025-03-03 05:03:58 -08:00
Lianmin Zheng	9e1014cf99	Revert "Add fast decode plan for flashinfer mla" (#4008 )	2025-03-02 19:29:10 -08:00
Baizhou Zhang	fa56106731	Add fast decode plan for flashinfer mla (#3987 )	2025-03-02 19:16:37 -08:00
Yineng Zhang	5d86016855	revert "Docs: Reorngaize dpsk links #3900 " (#3933 )	2025-02-27 08:57:13 -08:00
Baizhou Zhang	3e02526b1f	[Doc] Add experimental tag for flashinfer mla (#3925 )	2025-02-27 01:55:36 -08:00
Stefan He	d8a98a2cad	[Docs] Improve DPSK docs in dark mode (#3914 )	2025-02-27 00:13:04 -08:00
Baizhou Zhang	71ed01833d	[doc] Update document for flashinfer mla (#3907 )	2025-02-26 20:40:45 -08:00
Chayenne	7c1692aa90	Docs: Reorngaize dpsk links (#3900 )	2025-02-26 15:16:31 -08:00
Chayenne	8f019c7d1a	Docs: Move dpsk docs forward a step (#3894 )	2025-02-26 11:43:20 -08:00
Shenggui Li	3dc9ff3ce8	[doc] fixed dpsk quant faq (#3865 )	2025-02-25 19:40:47 -08:00
Shenggui Li	06427dfab1	[doc] added quantization doc for dpsk (#3843 )	2025-02-25 09:43:28 -08:00
Shenggui Li	c0bb9eb3b3	[improve] made timeout configurable (#3803 )	2025-02-25 00:26:08 -08:00
Baizhou Zhang	4d2a88bdff	[Docs]Add instruction for manually stopping nsys profiler (#3795 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-23 13:21:48 -08:00
Mick	45205d88a0	bench: Add MMMU benchmark for vLM (#3562 )	2025-02-22 08:10:59 -08:00
simveit	20b765a26e	Model: Support Qwen 72B RM model. (#3772 )	2025-02-21 14:38:21 -08:00
simveit	4592afc27d	Docs: Fix layout to docs (#3733 )	2025-02-21 11:24:13 -08:00
Shakhizat Nurgaliyev	d8d75d256a	Change description of nvidia jetson docs (#3761 )	2025-02-21 20:44:22 +08:00
Shenggui Li	c6a4852136	[docs] added torch.compile cache to dpsk manual (#3737 )	2025-02-21 00:11:40 -08:00
Baizhou Zhang	ac05310098	[Docs] Modify ep related server args and remove cublas part of deepseek (#3732 )	2025-02-21 03:37:56 +08:00
Chayenne	3c7bfd7eab	Docs: Fix layout with sub-section (#3710 )	2025-02-19 15:44:30 -08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
ybyang	c51dc2cc8d	Docs: Deploy multi-node inference (LWS method) using sglang in a K8s cluster (#3624 )	2025-02-17 18:14:20 -08:00
Shenggui Li	c9565e49e7	[docker] added rdma support (#3619 )	2025-02-17 15:36:16 +08:00
Shi Shuai	d03c4c25a7	[docs] Update sampling_params.md (#3617 )	2025-02-16 18:52:30 -08:00
simveit	8f13377dea	Draft of updated doc for sampling params. (#3260 ) Co-authored-by: shuaills <shishuaicareer@gmail.com>	2025-02-16 14:28:22 -08:00
Mick	bcc213df61	Model: Support Qwen 2.5 vl (#3258 )	2025-02-16 00:58:53 -08:00
Jhin	bf2a70872e	Update DeepSeek V3 Doc (#3541 )	2025-02-12 23:15:37 -08:00
Zachary Streeter	8adbc78b30	added llama and cleaned up (#3503 )	2025-02-12 18:48:30 +08:00
Mick	ced680663c	doc: Support a new vLM (#3405 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-12 00:43:14 -08:00
Zachary Streeter	2491cc928d	add deepseek-v3 amd docker command (#3495 )	2025-02-12 03:03:08 +08:00
Didier Durand	9490d15772	fix supported_models Qwen typo (#3498 )	2025-02-12 02:59:18 +08:00
Didier Durand	eefcbdd353	fix deepseek_v3 typo (#3497 )	2025-02-12 02:58:36 +08:00
Ying Sheng	52a492a16e	Update contribution_guide.md (#3452 )	2025-02-10 12:53:47 +08:00
Shi Shuai	20cf910d8f	[docs] Update quantization documentation (#3437 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: jamessand <shazhizhou0@gmail.com>	2025-02-09 10:39:49 -08:00
Wenxuan Tan	0af1d239cb	[Docs] Add quantization docs (#3410 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-02-10 02:16:21 +08:00
Shi Shuai	6702592d0e	[docs] Add multi-node inference example for SLURM in documentation (#3408 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: aflah02 <aflah20082@iiitd.ac.in>	2025-02-08 21:45:14 -08:00
Zachary Streeter	0a6f18f068	added amd_configure.md to references (#3275 ) Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-07 08:50:49 -08:00
Shi Shuai	591e751e07	Fix: Runtime error for function calling (#3300 )	2025-02-06 20:52:01 -08:00

1 2 3

111 Commits