sglang

Author	SHA1	Message	Date
Ke Bao	0475448ee3	Optimize triton swa kernel by skipping computation (#8860 )	2025-08-06 21:37:50 +08:00
Yineng Zhang	1466c1b896	feat: support glm4 tuning (#8473 )	2025-07-28 14:32:58 -07:00
Yuxuan Zhang	6d6a8bc278	GLM-4.5 Model Support (#8224 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-27 22:54:07 -07:00
Cheng Wan	abda2542d5	Fix tuning_fused_moe_triton.py (#8175 )	2025-07-19 17:33:50 -07:00
Yuan Luo	253454de9b	Integrate triton moe kernel (#7689 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-07-06 20:05:49 -07:00
Xiaoyu Zhang	0ae1e9a755	refine fused_moe benchmark (#7221 )	2025-06-15 21:21:32 -07:00
Quanfeng Li	ef32677444	Fix positional argument (#7093 )	2025-06-11 18:31:13 -07:00
Xiaoyu Zhang	3712abfaf9	Fuse routed scaling factor in deepseek (#6970 )	2025-06-08 15:24:24 -07:00
Xiaoyu Zhang	fa3592cfeb	rebase h20 fused_moe config (#6966 )	2025-06-08 05:01:34 -07:00
Yineng Zhang	1fb76ebb93	Revert "Fuse routed scaling factor in topk_reduce kernel (#6220 )" (#6968 )	2025-06-07 21:02:49 -07:00
Xiaoyu Zhang	515ef4facb	Fuse routed scaling factor in topk_reduce kernel (#6220 )	2025-06-07 11:06:50 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
JieXin Liang	d9d35def3d	[test] add ut and bm for get_last_loc (#6746 )	2025-05-29 11:47:21 -07:00
fzyzcjy	6df81e8a39	Support tuning DeepEP configs (#6742 )	2025-05-29 08:12:22 -07:00
ChangyiYang	485a023bd8	refactor apply_w8a8_block_fp8_linear in fp (#6545 )	2025-05-29 00:15:11 -07:00
Xiaoyu Zhang	076103535c	fix log_info_on_rank0 error when run benchmark (#6260 )	2025-05-28 00:20:01 -07:00
Yuan Luo	c087ddd686	Refine pre_reorder_triton_kernel slightly to improve performance (#6627 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-28 00:15:23 -07:00
fzyzcjy	ef8ec07b2c	Support tuning moe for llama 4 model (#6042 )	2025-05-12 15:47:01 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Lifu Huang	6e2da51561	Replace time.time() to time.perf_counter() for benchmarking. (#6178 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-11 14:32:49 -07:00
Xiaoyu Zhang	1cc326032d	simplify fused_moe config logging (#5801 )	2025-04-28 17:04:54 -07:00
Yi Zhang	a0251a3fd6	add fused moe config for qwen3moe fp8/bf16 (#5849 )	2025-04-28 11:55:52 -07:00
Xiaoyu Zhang	e132cba2a8	fused moe triton tuning script support qwen3 (#5842 )	2025-04-28 09:13:04 -07:00
XinyuanTong	0045f4b2af	feat: Add fused moe triton config for qwen3 moe on h100 (#5833 )	2025-04-28 08:37:13 -07:00
Zhaoyi Li	c555d794f7	Minor update for ROCm variable style (#5562 )	2025-04-19 23:45:27 -07:00
lambert0312	61e7c4dd21	Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368 )	2025-04-14 18:39:44 -07:00
Xiaoyu Zhang	3e4794aad8	refine fused_moe tuning docs (#5294 )	2025-04-12 10:01:13 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
Chunan Zeng	14269198e3	[Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735 )	2025-03-24 20:56:31 -07:00
Tongbao Zhang	3980ff1be6	rename benchmark_deepgemm_fp8_group_gemm.py (#4605 )	2025-03-23 23:35:20 -07:00
penguin_wwy	38f25e87fc	Correcting default configuration when benchmarking fused_moe (#4665 )	2025-03-22 00:52:34 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Xiaoyu Zhang	7130a7cea9	refine sgl_moe_align_block_size_benchmark (#4327 )	2025-03-11 22:48:38 -07:00
yych0745	6a02b32d07	Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-03-11 00:49:06 -07:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Stefan He	0194948fd9	Optimize Triton Kernel of Group GEMM in DeepGEMM Benchmark (#4014 )	2025-03-02 23:29:55 -08:00
Stefan He	b7e274f2d9	Add Benchmark for DeepGEMM Group GEMM (#3993 )	2025-03-02 17:47:21 -08:00
Xiaoyu Zhang	50f28f65a0	fix typo in deep gemm benchmarking(#3991 )	2025-03-02 00:34:00 -08:00
Xiaoyu Zhang	90a55e2566	add deepgemm and sglang fp8 block-wise gemm benchmark (#3893 )	2025-03-01 23:01:58 -08:00
Chayenne	18bb216c28	Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982 )	2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team	1c96fa86cf	[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613 )	2025-02-27 19:42:48 -08:00
laixin	b0df5d240b	Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-02-27 10:59:46 +00:00
yigex	ddf39d3fce	[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567 )	2025-02-17 17:54:10 -08:00
Xiaoyu Zhang	c38f3aed24	support multi-gpu block-gemm tuning (#3639 )	2025-02-18 00:00:35 +08:00
yigex	fdf04a1426	[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418 ) Co-authored-by: Bruce Xue <yigex@xilinx.com> Co-authored-by: HAI <hixiao@gmail.com>	2025-02-10 23:55:04 -08:00

1 2

81 Commits