sglang/attention at e8999b13b7c346297d7de88682f88a5cc35c80a0 - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Baizhou Zhang e8999b13b7 Replace enable_flashinfer_mla argument with attention_backend (#5005 )

2025-04-03 02:53:58 -07:00

..

Fix shared memory OOM on sm86 GPUs. (#4797 )

2025-03-26 10:41:53 -07:00

base_attn_backend.py

fix(typo): fix reply to replay in base_attn_backend.py (#4784 )

2025-03-26 00:19:12 -07:00

double_sparsity_backend.py

Misc clean up; Remove the support of jump forward (#4032 )

2025-03-03 07:02:14 -08:00

flashattention_backend.py

Add Eagle Speculative Decoding to FA3 Backend (#4951 )

2025-04-02 13:09:02 -07:00

flashinfer_backend.py

Support page size > 1 + eagle (#4908 )

2025-03-30 00:46:23 -07:00

flashinfer_mla_backend.py

Replace enable_flashinfer_mla argument with attention_backend (#5005 )

2025-04-03 02:53:58 -07:00

flashmla_backend.py

[Fix] avoid stream sync and torch compile in prefill for fa3 backend (#4932 )

2025-03-30 13:53:44 -07:00

torch_native_backend.py

Misc clean up; Remove the support of jump forward (#4032 )

2025-03-03 07:02:14 -08:00

triton_backend.py

[fix] fix illegal mem access and clean up triton attention backend (#4571 )

2025-03-20 02:01:52 -07:00

utils.py

Support FlashMLA backend cuda graph (#4514 )

2025-03-19 08:25:34 -07:00

vision.py

refactor: bug fixes and refactor for vlm (#4661 )

2025-03-22 22:48:49 -07:00