sglang

Author	SHA1	Message	Date
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
Chunan Zeng	14269198e3	[Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735 )	2025-03-24 20:56:31 -07:00
Tongbao Zhang	3980ff1be6	rename benchmark_deepgemm_fp8_group_gemm.py (#4605 )	2025-03-23 23:35:20 -07:00
penguin_wwy	38f25e87fc	Correcting default configuration when benchmarking fused_moe (#4665 )	2025-03-22 00:52:34 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Xiaoyu Zhang	7130a7cea9	refine sgl_moe_align_block_size_benchmark (#4327 )	2025-03-11 22:48:38 -07:00
yych0745	6a02b32d07	Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-03-11 00:49:06 -07:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Stefan He	0194948fd9	Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014 )	2025-03-02 23:29:55 -08:00
Stefan He	b7e274f2d9	Add Benchmark for DeepGEMM Group GEMM (#3993 )	2025-03-02 17:47:21 -08:00
Xiaoyu Zhang	50f28f65a0	fix typo in deep gemm benchmarking(#3991 )	2025-03-02 00:34:00 -08:00
Xiaoyu Zhang	90a55e2566	add deepgemm and sglang fp8 block-wise gemm benchmark (#3893 )	2025-03-01 23:01:58 -08:00
Chayenne	18bb216c28	Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982 )	2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team	1c96fa86cf	[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613 )	2025-02-27 19:42:48 -08:00
laixin	b0df5d240b	Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-02-27 10:59:46 +00:00
yigex	ddf39d3fce	[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567 )	2025-02-17 17:54:10 -08:00
Xiaoyu Zhang	c38f3aed24	support multi-gpu block-gemm tuning (#3639 )	2025-02-18 00:00:35 +08:00
yigex	fdf04a1426	[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418 ) Co-authored-by: Bruce Xue <yigex@xilinx.com> Co-authored-by: HAI <hixiao@gmail.com>	2025-02-10 23:55:04 -08:00
Xiaoyu Zhang	2f47d710ae	refine some typo (#3473 )	2025-02-10 23:35:44 +08:00
Yineng Zhang	fad315cb8e	fix EAGLE 2 non greedy case (#3407 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-09 07:28:34 +08:00
GaoYuYang	849f58d617	Update fused_moe's benchmark (#3346 )	2025-02-08 21:58:21 +08:00
yiakwy-xpu-ml-framework-team	64480df495	[BUG] fix moe benchmark when bs*seq is small (#3382 )	2025-02-08 15:39:44 +08:00
Xiaoyu Zhang	cdae77b03d	optimize moe_align_kernel cuda (#3347 )	2025-02-07 00:53:46 +08:00
Xiaoyu Zhang	ad3499858e	clean moe align block kernel code and add acc test (#3332 )	2025-02-06 16:42:36 +08:00
Yineng Zhang	7b020cca2d	add tuning block wise fp8 (#3242 ) Co-authored-by: HandH1998 <007aabbcc411@gmail.com>	2025-02-01 03:58:18 +08:00
yigex	351a72d40b	add dsv3 mi300 triton config for block scale (#3146 )	2025-01-27 17:25:53 +08:00
Lianmin Zheng	27acf63bbd	Use torch.compile for scaling penalty (#3133 )	2025-01-25 18:27:33 -08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
yiakwy-xpu-ml-framework-team	10bfce71b3	fix moe align blocks benchmark (#3003 )	2025-01-20 19:33:29 +08:00
Xiaoyu Zhang	83452dbb4a	fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971 )	2025-01-18 18:56:13 -08:00
Xiaoyu Zhang	c2f212d672	optimize MiniMax-Text-01 lightning_attn_decode triton (#2966 )	2025-01-18 23:41:01 +08:00
Xiaoyu Zhang	78e974b2a5	[kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920 )	2025-01-16 12:51:38 -08:00
Xiaoyu Zhang	ab31793661	[kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911 )	2025-01-16 14:18:29 +08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Ke Bao	85b2e05770	Add int8 quant kernel (#2848 )	2025-01-13 13:16:58 +08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Xiaoyu Zhang	380930a959	add benchmark_moe_align_blocks (#2767 )	2025-01-07 14:20:50 +08:00
HandH1998	afa0341e57	Update Triton configs for block fp8 kernels (#2641 )	2024-12-29 22:53:47 +08:00
Yineng Zhang	7863e4368a	add configs for block fp8 related kernels (#2628 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-28 23:12:04 +08:00
Xiaoyu Zhang	9a23c48456	h100 tuning fused_moe_triton for qwen2 moe (#2560 )	2024-12-26 03:13:31 -08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
Xiaoyu Zhang	7d672d277b	[kernel optimize] benchmark write_req_to_token_pool_triton and optimize kernel (#2509 )	2024-12-22 02:31:02 -08:00
bjmsong	e21026690d	benchmark decoding attention kernel with cudnn (#2467 ) Co-authored-by: root <bjmsong@126.com>	2024-12-17 03:31:57 -08:00
Xiaoyu Zhang	a0592c059f	[Benchmark] add a benchmark for hf/vllm/sglang rmsnorm (#2486 )	2024-12-15 13:52:08 +08:00
bjmsong	f67723940d	decoding attention kernel benchmark (#2425 ) Co-authored-by: root <bjmsong@126.com>	2024-12-11 04:46:59 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Lianmin Zheng	33deca81b5	Add more fused moe benchmark utilities (#2314 )	2024-12-02 04:26:55 -08:00

1 2

51 Commits