Lianmin Zheng
|
8e1adb8441
|
Allow overwrite flashinfer use_tensorcore (#2169)
|
2024-11-24 20:58:17 -08:00 |
|
Lianmin Zheng
|
c211e7b669
|
Simplify batch update (#2154)
|
2024-11-24 04:47:10 -08:00 |
|
Byron Hsu
|
cbedd1db1d
|
[router] cache-aware load-balancing router v1 (#2114)
|
2024-11-23 08:34:48 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Yineng Zhang
|
4f8c3aeafc
|
minor: update gsm8k threshold (#2125)
|
2024-11-22 19:23:58 +08:00 |
|
bjmsong
|
ad30d5cf9a
|
Benchmark with Pytorch Profiler easily (#2110)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-21 23:29:50 -08:00 |
|
Lianmin Zheng
|
dfec7fca06
|
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
|
2024-11-21 20:07:48 -08:00 |
|
James Xu
|
f6f713797b
|
Add support for Qwen2-VL-based embedding models (#2055)
|
2024-11-21 14:24:25 -08:00 |
|
Lianmin Zheng
|
7d671e4ad2
|
Enable overlap by default (#2067)
|
2024-11-19 22:07:58 -08:00 |
|
Lianmin Zheng
|
b110453802
|
Simplify logits penalizer (#2086)
|
2024-11-18 17:48:28 -08:00 |
|
Yineng Zhang
|
766192610e
|
feat: update torch 2.5.1 (#2069)
|
2024-11-18 21:29:13 +08:00 |
|
ws
|
29ebe3dff4
|
fix: align enable_overlap_scheduler naming between code and docs (#2038)
|
2024-11-15 03:39:10 -08:00 |
|
Lianmin Zheng
|
aae5434bdf
|
Fix unit tests (#2034)
|
2024-11-14 11:08:37 -08:00 |
|
Lianmin Zheng
|
c3eac1b010
|
Fix torch.compile for MoE (#2033)
|
2024-11-14 01:30:24 -08:00 |
|
James Xu
|
ddeb9d42de
|
Add engine encode (#1995)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-11-11 11:48:17 -08:00 |
|
Lianmin Zheng
|
520f0094e4
|
[CI] balance unit tests (#1977)
|
2024-11-09 16:46:14 -08:00 |
|
Lianmin Zheng
|
9c939a3d8b
|
Clean up metrics code (#1972)
|
2024-11-09 15:43:20 -08:00 |
|
Yudi Xue
|
95a4ed129a
|
Fix metrics (#1963)
|
2024-11-08 23:21:11 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Lianmin Zheng
|
2ce32db6fb
|
Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
|
2024-11-03 13:27:12 -08:00 |
|
Lianmin Zheng
|
f7102fbd2b
|
Fix mixed chunked prefill (#1850)
|
2024-10-30 21:20:41 -07:00 |
|
Lianmin Zheng
|
86fc0d79d0
|
Add a watch dog thread (#1816)
|
2024-10-27 02:00:50 -07:00 |
|
Lianmin Zheng
|
c555ce2ca2
|
Revert "Fix memory leak when doing chunked prefill" (#1797)
|
2024-10-25 10:24:44 -07:00 |
|
Lianmin Zheng
|
40900baea7
|
[Fix] Fix the log parsing in chunked prefill uni tests (#1794)
|
2024-10-25 08:31:08 -07:00 |
|
Liangsheng Yin
|
a2f5e7555f
|
Fix memory leak when doing chunked prefill (#1787)
|
2024-10-25 08:01:17 -07:00 |
|
Lianmin Zheng
|
2148914e1b
|
Fix log parsing in the chunked prefill unit tests (#1793)
|
2024-10-25 08:00:55 -07:00 |
|
Lianmin Zheng
|
1701b0db31
|
Enhance the test case for chunked prefill (#1785)
|
2024-10-24 21:23:09 -07:00 |
|
Lianmin Zheng
|
87a7cfa080
|
Fix MockTokenizer in the unit tests (#1774)
|
2024-10-23 17:47:05 -07:00 |
|
Lianmin Zheng
|
ad4125d1a9
|
Fuse more ops & Simplify token mapping (#1758)
|
2024-10-22 23:20:43 -07:00 |
|
Lianmin Zheng
|
00611286a1
|
Fix sliding window attention and gemma-2 unit tests in CI (#1746)
|
2024-10-21 13:47:12 -07:00 |
|
Lianmin Zheng
|
2bcfba1b08
|
Skip unnecessary penalizer (#1707)
|
2024-10-18 17:54:03 -07:00 |
|
Lianmin Zheng
|
6790240cc3
|
Fix unit test order to balance the tasks in CI (#1665)
|
2024-10-14 02:01:44 -07:00 |
|
Byron Hsu
|
862cd265e5
|
[engine] support async and streaming (#1614)
|
2024-10-11 15:26:25 -07:00 |
|
Byron Hsu
|
2422de5193
|
Support min_tokens in sgl.gen (#1573)
|
2024-10-05 21:51:12 -07:00 |
|
Byron Hsu
|
dde8bb16fe
|
default sampling param should be deepcopied (#1581)
|
2024-10-05 17:27:43 -07:00 |
|
Byron Hsu
|
6bfdb4031d
|
[Easy] use .text() instead of .text (#1577)
|
2024-10-05 11:07:41 -07:00 |
|
Ying Sheng
|
04b262cd91
|
[Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-10-04 08:51:11 +00:00 |
|
Theresa Barton
|
2c7d0a5b8b
|
[Fix] Fix all the Huggingface paths (#1553)
|
2024-10-02 10:12:07 -07:00 |
|
Lianmin Zheng
|
3f0fe08d37
|
Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541)
|
2024-09-29 20:28:45 -07:00 |
|
Ying Sheng
|
9aa6553d2a
|
[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525)
|
2024-09-27 23:32:11 -07:00 |
|
Lianmin Zheng
|
fb2d0680e0
|
[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510)
|
2024-09-24 21:37:33 -07:00 |
|
Yineng Zhang
|
42a2d82ba7
|
minor: add mla fp8 test (#1494)
|
2024-09-23 20:40:17 +08:00 |
|
Lianmin Zheng
|
167591e864
|
Better unit tests for adding a new model (#1488)
|
2024-09-22 01:50:37 -07:00 |
|
Ke Bao
|
b8ccaf4d73
|
Add MLA gsm8k eval (#1484)
|
2024-09-21 11:16:13 +08:00 |
|
Ke Bao
|
a68cb201dd
|
Fix triton head num (#1482)
|
2024-09-21 10:25:20 +08:00 |
|
Yineng Zhang
|
a6db88626e
|
minor: add quant eval compared with base (#1475)
|
2024-09-20 01:57:19 +08:00 |
|
Lianmin Zheng
|
1acccb364a
|
Fix oom issues with fp8 for llama (#1454)
|
2024-09-18 03:45:19 -07:00 |
|
Lianmin Zheng
|
5e62a6b706
|
Add bench_server_latency.py (#1452)
|
2024-09-18 00:56:06 -07:00 |
|
Lianmin Zheng
|
899cf5c438
|
Remove deprecated configs (#1431)
|
2024-09-15 08:52:18 -07:00 |
|