Logo
Explore Help
Register Sign In
EngineX-Hygon/sglang
5
0
Fork 0
You've already forked sglang
Code Issues Pull Requests Actions 7 Projects Releases Wiki Activity
Files
aee62d744b08d83db8d7b55753b41cc9ebfb1155
sglang/python/sglang/srt/layers/attention
History
Chang Su aee62d744b Optimize GPU memory usage in FlashAttentionBackend's strided indexing (#5262)
Co-authored-by: ch-wan <cwan39@gatech.edu>
2025-04-11 00:34:17 -07:00
..
triton_ops
Fix shared memory OOM on sm86 GPUs. (#4797)
2025-03-26 10:41:53 -07:00
base_attn_backend.py
fix(typo): fix reply to replay in base_attn_backend.py (#4784)
2025-03-26 00:19:12 -07:00
double_sparsity_backend.py
Misc clean up; Remove the support of jump forward (#4032)
2025-03-03 07:02:14 -08:00
flashattention_backend.py
Optimize GPU memory usage in FlashAttentionBackend's strided indexing (#5262)
2025-04-11 00:34:17 -07:00
flashinfer_backend.py
Fix loading KV quantization scale; Enable modelopt kv cache (#4686)
2025-04-08 09:11:35 -07:00
flashinfer_mla_backend.py
[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052)
2025-04-05 01:23:02 -07:00
flashmla_backend.py
[Fix] avoid stream sync and torch compile in prefill for fa3 backend (#4932)
2025-03-30 13:53:44 -07:00
torch_native_backend.py
Misc clean up; Remove the support of jump forward (#4032)
2025-03-03 07:02:14 -08:00
triton_backend.py
[fix] fix illegal mem access and clean up triton attention backend (#4571)
2025-03-20 02:01:52 -07:00
utils.py
Support FlashMLA backend cuda graph (#4514)
2025-03-19 08:25:34 -07:00
vision.py
refactor: bug fixes and refactor for vlm (#4661)
2025-03-22 22:48:49 -07:00
Powered by Gitea Version: 1.24.3 Page: 569ms Template: 8ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API