sglang/moe at 9045cc1eb8daa77e6d4d271e3bdebc6e26584303 - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Xiaoyu Zhang 9045cc1eb8 [torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering torch.compile in forward pass (#8353 )

2025-07-25 21:17:47 +08:00

..

[AMD] Remove vllm's scaled_fp8_quant and moe_sum when SGLANG_USE_AITER=1 (#7484 )

2025-07-21 17:33:19 -07:00

fused_moe_triton

[code style] Clean dead triton kernel code in fused_moe and useless vllm_ops import (#8310 )

2025-07-24 14:38:30 +08:00

cutlass_moe_params.py

[CUTLASS-FP4-MOE] Introduce CutlassMoEParams class for easy initialization of Cutlass Grouped Gems Metadata (#6887 )

2025-06-05 13:13:14 -07:00

cutlass_moe.py

Add a CUDA kernel for fusing mapping and weighted sum for MoE. (#6916 )

2025-06-07 15:24:39 -07:00

cutlass_w4a8_moe.py

feat: support DeepSeek-R1-W4AFP8 model with ep-moe mode (#7762 )

2025-07-07 14:47:21 -07:00

fused_moe_native.py

[1/N] MoE Refactor: refactor select_experts (#7966 )

2025-07-19 00:51:15 -07:00

router.py

Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 )

2025-06-29 23:16:19 -07:00

topk.py

[torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering torch.compile in forward pass (#8353 )

2025-07-25 21:17:47 +08:00