sglang/benchmark at ad4e58bf67ec833ff4d036af5129ec6e1633efc4 - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

JieXin Liang 1a3fa75f2f [Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466 )

2025-03-16 00:02:47 -07:00

..

bench_awq_dequant.py

Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )

2025-03-12 00:10:02 -07:00

bench_cublas_grouped_gemm.py

[Feature] Apply Cublas Grouped Gemm kernel (#3629 )

2025-02-18 15:18:31 +08:00

bench_fp8_blockwise_gemm.py

support blockwise fp8 matmul kernel (#3267 )

2025-02-13 01:49:33 +08:00

bench_fp8_gemm.py

support w8a8 fp8 kernel with CUTLASS (#3047 )

2025-01-26 15:46:51 +08:00

bench_int8_gemm.py

Add shapes for int8 gemm benchmark (#3093 )

2025-01-24 12:27:30 +08:00

bench_lightning_attention_decode.py

[Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466 )

2025-03-16 00:02:47 -07:00

bench_moe_align_block_size.py

feat: support ep size < 32 for sgl kernel (#4348 )

2025-03-12 20:50:46 -07:00

bench_moe_topk_softmax.py

Add moe topk softmax templated from vllm (#4302 )

2025-03-14 12:03:33 -07:00

bench_per_tensor_quant_fp8.py

[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )

2025-03-06 18:05:43 -08:00

bench_per_token_group_quant_fp8.py

[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163 )

2025-03-06 22:58:52 -08:00

bench_per_token_quant_fp8.py

fix accuracy issue (#4376 )

2025-03-13 02:06:22 -07:00