sglang/gemm at 34e07a65f192f7869a370b511aebbb084e50c1f0 - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Trevor Morris e9f8e42318 Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

..

awq_kernel.cu

[1/3] fix dsv3 awq issue (#4556 )

2025-03-22 01:07:17 -07:00

bmm_fp8.cu

Move rope and bmm into sgl-kernel (#4241 )

2025-03-09 18:38:15 -07:00

cublas_grouped_gemm.cu

Simplify tests & Fix trtllm custom allreduce registration (#4252 )

2025-03-10 01:24:22 -07:00

fp8_blockwise_gemm_kernel.cu

Support Blackwell Block Scale FP8 Gemm (#4278 )

2025-03-12 14:17:11 -07:00

fp8_gemm_kernel.cu

Support fp8 gemm for blackwell (#4558 )

2025-03-20 12:40:28 -07:00

int8_gemm_kernel.cu

Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )

2025-03-17 00:03:43 -07:00

nvfp4_quant_entry.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

nvfp4_quant_kernels.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

nvfp4_scaled_mm_entry.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

nvfp4_scaled_mm_kernels.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

per_tensor_quant_fp8.cu

Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00

per_token_group_quant_8bit.cu

[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )

2025-03-23 23:44:17 -07:00

per_token_quant_fp8.cu

Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00