sglang/layers at 087ab832236ef264746d8c75af8cd8752f56ca6b - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

HAI 087ab83223 [Performance, Triton] Optimize over mask compute to tl.load in fused_moe_kernel (#1980 )

2024-11-10 18:54:43 -08:00

..

[Performance, Triton] Optimize over mask compute to tl.load in fused_moe_kernel (#1980 )

2024-11-10 18:54:43 -08:00

[Performance, Triton] Optimize over mask compute to tl.load in fused_moe_kernel (#1980 )

2024-11-10 18:54:43 -08:00

fix black in pre-commit (#1940 )

2024-11-08 07:42:47 +08:00

activation.py

Optimize broadcast & Reorg code (#1598 )

2024-10-07 13:19:23 -07:00

layernorm.py

Use is_flashinfer_available to replace is_hip for flashinfer check (#1596 )

2024-10-06 22:54:05 -07:00

linear.py

Update vllm to 0.6.3 (#1711 ) (#1720 )

2024-10-19 20:45:41 -07:00

logits_processor.py

Fix logprob in the overlapped mode (#1795 )

2024-10-25 11:06:57 -07:00

pooler.py

Rename InputMetadata -> ForwardBatch (#1543 )

2024-09-30 02:41:11 -07:00

radix_attention.py

Simplify flashinfer dispatch (#1552 )

2024-10-01 00:28:42 -07:00

rotary_embedding.py

Qwen2vl support cuda graph and disable radix cache (#1780 )

2024-10-25 10:45:17 -04:00

sampler.py

Fix the perf regression due to additional_stop_token_ids (#1773 )

2024-10-23 16:45:21 -07:00

torchao_utils.py

Add float8 dynamic quant to torchao_utils (#1528 )

2024-09-28 12:27:54 -07:00

vocab_parallel_embedding.py

fix black in pre-commit (#1940 )

2024-11-08 07:42:47 +08:00