sglang/layers at 86a2c473b775f9051f460b4107a34c5e662fd1a3 - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Lianmin Zheng 384d85ba35 Re-introduce get_cuda_graph_seq_len_fill_value (#1783 )

2024-10-24 13:30:11 -07:00

..

Re-introduce get_cuda_graph_seq_len_fill_value (#1783 )

2024-10-24 13:30:11 -07:00

[Performance, hardware] MoE tuning update to AMD MI300x GPUs (#1619 )

2024-10-10 22:48:15 -07:00

Remove references to squeezellm (#1603 )

2024-10-07 11:30:41 -07:00

activation.py

Optimize broadcast & Reorg code (#1598 )

2024-10-07 13:19:23 -07:00

layernorm.py

Use is_flashinfer_available to replace is_hip for flashinfer check (#1596 )

2024-10-06 22:54:05 -07:00

linear.py

Update vllm to 0.6.3 (#1711 ) (#1720 )

2024-10-19 20:45:41 -07:00

logits_processor.py

Clean up batch data structures: Introducing ModelWorkerBatch (#1544 )

2024-09-30 06:41:49 -07:00

pooler.py

Rename InputMetadata -> ForwardBatch (#1543 )

2024-09-30 02:41:11 -07:00

radix_attention.py

Simplify flashinfer dispatch (#1552 )

2024-10-01 00:28:42 -07:00

rotary_embedding.py

Support qwen2 vl model (#1721 )

2024-10-19 21:44:38 -07:00

sampler.py

Fix the perf regression due to additional_stop_token_ids (#1773 )

2024-10-23 16:45:21 -07:00

torchao_utils.py

Add float8 dynamic quant to torchao_utils (#1528 )

2024-09-28 12:27:54 -07:00