Logo
Explore Help
Register Sign In
EngineX-Hygon/sglang
5
0
Fork 0
You've already forked sglang
Code Issues Pull Requests Actions 7 Projects Releases Wiki Activity
Files
e8999b13b7c346297d7de88682f88a5cc35c80a0
sglang/python/sglang/srt/layers/attention
History
Baizhou Zhang e8999b13b7 Replace enable_flashinfer_mla argument with attention_backend (#5005)
2025-04-03 02:53:58 -07:00
..
triton_ops
Fix shared memory OOM on sm86 GPUs. (#4797)
2025-03-26 10:41:53 -07:00
base_attn_backend.py
fix(typo): fix reply to replay in base_attn_backend.py (#4784)
2025-03-26 00:19:12 -07:00
double_sparsity_backend.py
Misc clean up; Remove the support of jump forward (#4032)
2025-03-03 07:02:14 -08:00
flashattention_backend.py
Add Eagle Speculative Decoding to FA3 Backend (#4951)
2025-04-02 13:09:02 -07:00
flashinfer_backend.py
Support page size > 1 + eagle (#4908)
2025-03-30 00:46:23 -07:00
flashinfer_mla_backend.py
Replace enable_flashinfer_mla argument with attention_backend (#5005)
2025-04-03 02:53:58 -07:00
flashmla_backend.py
[Fix] avoid stream sync and torch compile in prefill for fa3 backend (#4932)
2025-03-30 13:53:44 -07:00
torch_native_backend.py
Misc clean up; Remove the support of jump forward (#4032)
2025-03-03 07:02:14 -08:00
triton_backend.py
[fix] fix illegal mem access and clean up triton attention backend (#4571)
2025-03-20 02:01:52 -07:00
utils.py
Support FlashMLA backend cuda graph (#4514)
2025-03-19 08:25:34 -07:00
vision.py
refactor: bug fixes and refactor for vlm (#4661)
2025-03-22 22:48:49 -07:00
Powered by Gitea Version: 1.24.3 Page: 399ms Template: 8ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API