sglang/gemm at 44f47d3ee1e66ecce73d2e98c8847cd94ab54ea7 - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Yi Pan 45fdf1f7f3 Fix shared memory OOM on sm86 GPUs. (#4797 )

2025-03-26 10:41:53 -07:00

..

awq_kernel.cu

[1/3] fix dsv3 awq issue (#4556 )

2025-03-22 01:07:17 -07:00

bmm_fp8.cu

Move rope and bmm into sgl-kernel (#4241 )

2025-03-09 18:38:15 -07:00

cublas_grouped_gemm.cu

Simplify tests & Fix trtllm custom allreduce registration (#4252 )

2025-03-10 01:24:22 -07:00

fp8_blockwise_gemm_kernel.cu

Support Blackwell Block Scale FP8 Gemm (#4278 )

2025-03-12 14:17:11 -07:00

fp8_gemm_kernel.cu

Support fp8 gemm for blackwell (#4558 )

2025-03-20 12:40:28 -07:00

int8_gemm_kernel.cu

Fix shared memory OOM on sm86 GPUs. (#4797 )

2025-03-26 10:41:53 -07:00

nvfp4_quant_entry.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

nvfp4_quant_kernels.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

nvfp4_scaled_mm_entry.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

nvfp4_scaled_mm_kernels.cu

Support FP4 gemm (1/2) (#3899 )

2025-03-24 19:50:23 -07:00

per_tensor_quant_fp8.cu

Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00

per_token_group_quant_8bit.cu

[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396 )

2025-03-23 23:44:17 -07:00

per_token_quant_fp8.cu

Speed up per token and per tensor quant by 15% (#4639 )

2025-03-22 00:37:57 -07:00