sglang/benchmark at d9d35def3db5ebfd02e26173458e207b2732ecfd - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

ChangyiYang 485a023bd8 refactor apply_w8a8_block_fp8_linear in fp (#6545 )

2025-05-29 00:15:11 -07:00

..

bench_awq_dequant.py

Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )

2025-03-12 00:10:02 -07:00

bench_fp8_blockwise_gemm.py

refactor apply_w8a8_block_fp8_linear in fp (#6545 )

2025-05-29 00:15:11 -07:00

bench_fp8_gemm.py

support w8a8 fp8 kernel with CUTLASS (#3047 )

2025-01-26 15:46:51 +08:00

bench_int8_gemm.py

Add shapes for int8 gemm benchmark (#3093 )

2025-01-24 12:27:30 +08:00

bench_lightning_attention_decode.py

[Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466 )

2025-03-16 00:02:47 -07:00

bench_moe_align_block_size.py

Revert "fix some typos" (#6244 )

2025-05-12 12:53:26 -07:00

bench_moe_fused_gate.py

Add deepseek style fused moe group gate selection kernel (#4530 )

2025-03-29 11:51:45 -07:00

bench_moe_topk_softmax.py

Add moe topk softmax templated from vllm (#4302 )

2025-03-14 12:03:33 -07:00

bench_per_tensor_quant_fp8.py

update variable naming and comments for rocm (#5299 )

2025-04-11 23:15:05 -07:00

bench_per_token_group_quant_8bit.py

Add typo checker in pre-commit (#6179 )

2025-05-11 12:55:00 +08:00

bench_per_token_quant_fp8.py

update variable naming and comments for rocm (#5299 )

2025-04-11 23:15:05 -07:00

bench_qserve_w4a8_gemm.py

[1/2] Support Qserve (#6457 )

2025-05-21 19:48:59 -07:00