sglang/sgl_kernel at 18efb5e8e0ed467f6dc42680d88787f5ed6c074e - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

JieXin Liang 18efb5e8e0 [perf][sgl-kernel] extend cutlass_mla_decode to support num_head < 128 (#6929 )

2025-06-08 19:37:34 -07:00

..

__init__.py

Add a CUDA kernel for fusing mapping and weighted sum for MoE. (#6916 )

2025-06-07 15:24:39 -07:00

allreduce.py

support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )

2025-06-04 22:11:24 -07:00

attention.py

[perf][sgl-kernel] extend cutlass_mla_decode to support num_head < 128 (#6929 )

2025-06-08 19:37:34 -07:00

elementwise.py

[Feat] Enable PDL automatically on Hopper architecture (#5981 )

2025-06-01 12:30:17 -07:00

flash_attn.py

Revert "fix some typos" (#6244 )

2025-05-12 12:53:26 -07:00

gemm.py

[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093 )

2025-06-02 13:48:03 -07:00

grammar.py

fix sgl-kernel unit tests (#5666 )

2025-04-23 01:18:30 -07:00

moe.py

Add a CUDA kernel for fusing mapping and weighted sum for MoE. (#6916 )

2025-06-07 15:24:39 -07:00

sampling.py

Fix sampler nan check when calling top_k_top_p_sampling_from_probs (#5546 )

2025-04-19 21:47:23 -07:00

sparse_flash_attn.py

[Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (#5847 )

2025-04-28 11:03:17 -07:00

speculative.py

use default for torch.ops (#4835 )

2025-03-27 19:09:58 -07:00

utils.py

misc: cache is_hopper_arch (#6799 )

2025-06-01 15:28:31 -07:00

version.py

chore: bump sgl-kernel v0.1.7 (#6963 )

2025-06-08 02:43:15 -07:00