sglang/gemm at 6a384d5c012e424e5baf9891efa5465088e807dc - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Chunan Zeng 6a384d5c01 Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00

..

awq_kernel.cu

Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104 )

2025-03-12 00:10:02 -07:00

bmm_fp8.cu

Move rope and bmm into sgl-kernel (#4241 )

2025-03-09 18:38:15 -07:00

cublas_grouped_gemm.cu

Simplify tests & Fix trtllm custom allreduce registration (#4252 )

2025-03-10 01:24:22 -07:00

fp8_blockwise_gemm_kernel.cu

Support Blackwell Block Scale FP8 Gemm (#4278 )

2025-03-12 14:17:11 -07:00

fp8_gemm_kernel.cu

Support fp8 gemm for blackwell (#4558 )

2025-03-20 12:40:28 -07:00

int8_gemm_kernel.cu

Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )

2025-03-17 00:03:43 -07:00

per_tensor_quant_fp8.cu

Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00

per_token_group_quant_fp8.cu

fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231 )

2025-03-10 01:42:58 -07:00

per_token_quant_fp8.cu

Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00