Logo
Explore Help
Register Sign In
EngineX-Hygon/sglang
5
0
Fork 0
You've already forked sglang
Code Issues Pull Requests Actions 7 Projects Releases Wiki Activity
Files
86a2c473b775f9051f460b4107a34c5e662fd1a3
sglang/python/sglang/srt/layers
History
Lianmin Zheng 384d85ba35 Re-introduce get_cuda_graph_seq_len_fill_value (#1783)
2024-10-24 13:30:11 -07:00
..
attention
Re-introduce get_cuda_graph_seq_len_fill_value (#1783)
2024-10-24 13:30:11 -07:00
fused_moe
[Performance, hardware] MoE tuning update to AMD MI300x GPUs (#1619)
2024-10-10 22:48:15 -07:00
quantization
Remove references to squeezellm (#1603)
2024-10-07 11:30:41 -07:00
activation.py
Optimize broadcast & Reorg code (#1598)
2024-10-07 13:19:23 -07:00
layernorm.py
Use is_flashinfer_available to replace is_hip for flashinfer check (#1596)
2024-10-06 22:54:05 -07:00
linear.py
Update vllm to 0.6.3 (#1711) (#1720)
2024-10-19 20:45:41 -07:00
logits_processor.py
Clean up batch data structures: Introducing ModelWorkerBatch (#1544)
2024-09-30 06:41:49 -07:00
pooler.py
Rename InputMetadata -> ForwardBatch (#1543)
2024-09-30 02:41:11 -07:00
radix_attention.py
Simplify flashinfer dispatch (#1552)
2024-10-01 00:28:42 -07:00
rotary_embedding.py
Support qwen2 vl model (#1721)
2024-10-19 21:44:38 -07:00
sampler.py
Fix the perf regression due to additional_stop_token_ids (#1773)
2024-10-23 16:45:21 -07:00
torchao_utils.py
Add float8 dynamic quant to torchao_utils (#1528)
2024-09-28 12:27:54 -07:00
Powered by Gitea Version: 1.24.3 Page: 92ms Template: 6ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API