sglang

Author	SHA1	Message	Date
simveit	bb121214c2	Variance measure for reasoning benchmark (#3677 )	2025-02-20 03:49:49 +08:00
Zhanghao Wu	f93e915817	[Docs] Add SkyPilot DeepSeek example (#3706 )	2025-02-20 02:10:23 +08:00
Yineng Zhang	fe0673f1cc	set NCCL_IB_GID_INDEX=3 for multi node NVIDIA InfiniBand if needed (#3698 )	2025-02-19 20:50:22 +08:00
yigex	ddf39d3fce	[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567 )	2025-02-17 17:54:10 -08:00
Xiaoyu Zhang	c38f3aed24	support multi-gpu block-gemm tuning (#3639 )	2025-02-18 00:00:35 +08:00
Shenggui Li	c9565e49e7	[docker] added rdma support (#3619 )	2025-02-17 15:36:16 +08:00
simveit	3d4a8f9bc0	Benchmark for reasoning models (#3532 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-17 03:07:30 +08:00
Yineng Zhang	ac963be234	update flashinfer-python (#3557 )	2025-02-14 09:52:56 +08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00
Yineng Zhang	20de05a753	update README (#3543 )	2025-02-13 17:22:11 +08:00
Jhin	bf2a70872e	Update DeepSeek V3 Doc (#3541 )	2025-02-12 23:15:37 -08:00
Xiaoyu Zhang	693c2600e0	refine deepseek_v3 launch server doc (#3522 )	2025-02-12 17:27:07 +08:00
yigex	fdf04a1426	[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418 ) Co-authored-by: Bruce Xue <yigex@xilinx.com> Co-authored-by: HAI <hixiao@gmail.com>	2025-02-10 23:55:04 -08:00
Xiaoyu Zhang	2f47d710ae	refine some typo (#3473 )	2025-02-10 23:35:44 +08:00
Yineng Zhang	cddb1cdf8f	chore: bump v0.4.2.post4 (#3459 )	2025-02-10 14:12:16 +08:00
Yineng Zhang	fad315cb8e	fix EAGLE 2 non greedy case (#3407 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-09 07:28:34 +08:00
Yineng Zhang	f90db8bc07	fix typo	2025-02-08 22:16:42 +08:00
Ke Bao	d8ad597048	Add deepseek-v3 a100 serving example (#3404 )	2025-02-08 22:13:52 +08:00
GaoYuYang	849f58d617	Update fused_moe's benchmark (#3346 )	2025-02-08 21:58:21 +08:00
yiakwy-xpu-ml-framework-team	64480df495	[BUG] fix moe benchmark when bs*seq is small (#3382 )	2025-02-08 15:39:44 +08:00
Yineng Zhang	c1f5f99f60	chore: bump v0.4.2.post3 (#3369 )	2025-02-07 08:20:03 -08:00
Xiaoyu Zhang	cdae77b03d	optimize moe_align_kernel cuda (#3347 )	2025-02-07 00:53:46 +08:00
Ke Bao	6792411e7f	[Doc] Add optimization option guide for deepseek v3 (#3349 )	2025-02-06 23:28:09 +08:00
Yineng Zhang	7348d9627e	add AMD guide for DeepSeek-R1 (#3338 )	2025-02-06 16:54:40 +08:00
Xiaoyu Zhang	ad3499858e	clean moe align block kernel code and add acc test (#3332 )	2025-02-06 16:42:36 +08:00
Yineng Zhang	07e58a2dcb	update README (#3324 )	2025-02-06 07:13:05 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Yineng Zhang	7b020cca2d	add tuning block wise fp8 (#3242 ) Co-authored-by: HandH1998 <007aabbcc411@gmail.com>	2025-02-01 03:58:18 +08:00
yigex	351a72d40b	add dsv3 mi300 triton config for block scale (#3146 )	2025-01-27 17:25:53 +08:00
Lianmin Zheng	27acf63bbd	Use torch.compile for scaling penalty (#3133 )	2025-01-25 18:27:33 -08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
yiakwy-xpu-ml-framework-team	10bfce71b3	fix moe align blocks benchmark (#3003 )	2025-01-20 19:33:29 +08:00
Xiaoyu Zhang	83452dbb4a	fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971 )	2025-01-18 18:56:13 -08:00
Xiaoyu Zhang	c2f212d672	optimize MiniMax-Text-01 lightning_attn_decode triton (#2966 )	2025-01-18 23:41:01 +08:00
Zhiqiang Xie	13387e6b7a	Multi-turn benchmark for hierarchical caching (#2942 )	2025-01-17 16:17:24 -08:00
Xiaoyu Zhang	78e974b2a5	[kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920 )	2025-01-16 12:51:38 -08:00
Lianmin Zheng	bc6915e3b9	Improve type annotation and styles (#2926 )	2025-01-16 12:51:11 -08:00
Xiaoyu Zhang	ab31793661	[kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911 )	2025-01-16 14:18:29 +08:00
Yineng Zhang	80002562a8	docs: update README (#2878 )	2025-01-14 12:48:17 +08:00
Lianmin Zheng	46d4431889	Add a new api configure_logging to allow dumping the requests (#2875 )	2025-01-13 14:24:00 -08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Yineng Zhang	41d7e5b7e6	docs: update link (#2857 )	2025-01-13 18:40:48 +08:00
Lianmin Zheng	72c7776355	Fix linear.py and improve weight loading (#2851 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-01-13 01:39:14 -08:00
Ke Bao	85b2e05770	Add int8 quant kernel (#2848 )	2025-01-13 13:16:58 +08:00
Yineng Zhang	197cbf9bab	docs: update README (#2841 )	2025-01-11 23:11:38 +08:00
Yineng Zhang	f624901cdd	chore: bump v0.4.1.post5 (#2840 )	2025-01-11 23:10:02 +08:00
sleepcoo	4f077c01b8	minor: support specifying local dataset path for gsm8k and hellaswag (#2816 )	2025-01-09 22:24:42 +08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Xiaoyu Zhang	380930a959	add benchmark_moe_align_blocks (#2767 )	2025-01-07 14:20:50 +08:00
Rodrigo Garcia	a990daff9c	Included multi-node DeepSeekv3 example (#2707 )	2025-01-02 22:17:03 +08:00

1 2 3 4

158 Commits