sglang

Files

Yuan Luo 53dcc750b6 [sgl-kernel] Support FlashInfer top_k_top_p_sampling_from_logits (#9060 )

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

2025-08-14 10:56:36 -07:00

bench_activation.py

[AMD] Add silu_and_mul, gelu_and_mul, gelu_tanh_and_mul, and gelu_quick kernels for AMD GPUs (#7135 )

2025-07-24 23:44:28 -07:00

bench_awq_dequant.py

Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )

2025-03-12 00:10:02 -07:00

bench_cutlass_mla.py

[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla (#7184 )

2025-06-14 12:45:41 -07:00

bench_dsv3_fused_a_gemm.py

Add dsv3 fused a gemm to sgl-kernel (#7630 )

2025-06-29 02:52:24 -07:00

bench_dsv3_router_gemm.py

[Kimi K2] dsv3_router_gemm supports NUM_EXPERTS == 384 (#8013 )

2025-08-01 22:01:24 +08:00

bench_fp4_gemm.py

Use FlashInfer FP4 gemm. (#8241 )

2025-07-27 01:05:22 -07:00

bench_fp8_blockwise_gemm.py

refactor apply_w8a8_block_fp8_linear in fp (#6545 )

2025-05-29 00:15:11 -07:00

bench_fp8_blockwise_group_gemm.py

fix benchmark fp8 blockwise group gemm (#8815 )

2025-08-06 21:02:21 +08:00

bench_fp8_gemm.py

[Perf] Tunings for SM100 FP8 CUTLASS kernel (#8818 )

2025-08-13 21:59:22 -07:00

bench_int8_gemm.py

Add shapes for int8 gemm benchmark (#3093 )

2025-01-24 12:27:30 +08:00

bench_lightning_attention_decode.py

[Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466 )

2025-03-16 00:02:47 -07:00

bench_moe_align_block_size.py

update sgl-kernel for EP: kernel part (#8514 )

2025-07-30 22:19:55 -07:00

bench_moe_ep_post_reorder.py

feat: integrate deepgemm into EPMoE (#6821 )

2025-06-23 01:38:58 -07:00

bench_moe_ep_pre_reorder.py

fix ep_moe_reorder kernel bugs (#6858 )

2025-06-04 19:13:59 +08:00

bench_moe_fused_gate.py

[fix] benchmark : routed_scaling_factor is None (#8059 )

2025-07-22 08:55:35 -07:00

bench_moe_silu_and_mul.py

[sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul (#6919 )

2025-06-11 20:43:08 -07:00

bench_moe_topk_softmax.py

[optimize] fuse renormalize into moe_topk_softmax (#7744 )

2025-07-03 12:42:44 -07:00

bench_nvfp4_scaled_gemm.py

Add nvfp4 scaled mm benchmark. (#8401 )

2025-07-26 23:18:04 -07:00

bench_per_tensor_quant_fp8.py

update variable naming and comments for rocm (#5299 )

2025-04-11 23:15:05 -07:00

bench_per_token_group_quant_8bit.py

Fix bench script making input data on L2 cache (#7739 )

2025-07-27 00:30:24 -07:00

bench_per_token_quant_fp8.py

fix per token cuda kernel hidden dim cannot divide by 16 (#8543 )

2025-08-01 09:27:18 -07:00

bench_qserve_w4a8_gemm.py

[1/2] Support Qserve (#6457 )

2025-05-21 19:48:59 -07:00

bench_rotary_embedding.py

Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077 )

2025-08-12 01:46:40 -07:00

bench_top_k_top_p_sampling.py

[sgl-kernel] Support FlashInfer top_k_top_p_sampling_from_logits (#9060 )

2025-08-14 10:56:36 -07:00