Commit Graph

46 Commits

Author SHA1 Message Date
Lianmin Zheng
490a1f39dd Fix cuda graph with flashinfer (#675) 2024-07-20 02:43:55 -07:00
zhyncs
2e341cd493 misc: add pre-commit config (#637) 2024-07-17 11:55:39 -07:00
Lianmin Zheng
41d1f67704 Fix flush cache (#627) 2024-07-15 20:44:04 -07:00
Ying Sheng
6a2941f4d0 Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
2024-07-15 07:10:51 -07:00
Mingyi
5ac8b80677 Simplify mem state (#623) 2024-07-15 02:01:09 -07:00
Ying Sheng
bae9541e4c Update benchmark script (#621) 2024-07-14 21:38:53 +00:00
Liangsheng Yin
564a898ad9 Optimize mem indices mangement (#619) 2024-07-13 23:39:37 -07:00
Lianmin Zheng
0feca02dd9 Improve benchmark scripts (#615) 2024-07-13 15:59:04 -07:00
Lianmin Zheng
65c6577696 Improve benchmark scripts & fix llava (#613) 2024-07-13 15:00:26 -07:00
Lianmin Zheng
665815969a Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00
Liangsheng Yin
f25b76c02a add LogitsMetadata (#604) 2024-07-08 17:46:55 -07:00
Ying Sheng
dc1b8bcfaa Format (#593) 2024-07-05 10:06:17 -07:00
sglang
11616fc6bd Minor fix in compiler & format (#545) 2024-06-29 23:42:14 -07:00
Lianmin Zheng
945aa9beb2 Update readme (#568) 2024-06-27 11:37:49 -07:00
Lianmin Zheng
2e6e62e156 Increase the number of thread limitation for tp worker managers. (#567) 2024-06-26 09:33:45 -07:00
Lianmin Zheng
a385ee27bd Warmup cublas (#566) 2024-06-25 12:46:00 -07:00
Liangsheng Yin
92cb93f390 Fix latency benchmark (#557) 2024-06-22 15:11:04 +08:00
Ying Sheng
09593e9bc9 Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-06-17 20:41:24 -07:00
Liangsheng Yin
40e53d65cb Add disk cache for loading ShareGPT dataset. (#542) 2024-06-13 16:37:12 +08:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
Ying Sheng
1374334d38 Fix dependency & crash issues (#539) 2024-06-12 21:23:19 -07:00
Lianmin Zheng
3bc01ac137 [Minor] improve code style 2024-06-03 18:11:34 -07:00
Lianmin Zheng
09de730dee Improve benchmark scripts & add more models (#484) 2024-05-27 14:13:26 -07:00
Lianmin Zheng
55c1643627 Improve benchmark scripts & rename some scripts (#477) 2024-05-26 12:51:45 -07:00
Ying Sheng
947bda73fe Add benchmark scripts (#476) 2024-05-26 12:09:03 -07:00
Lianmin Zheng
2cea6146d8 Improve logging & add logit cap (#471) 2024-05-24 03:48:53 -07:00
Liangsheng Yin
690d162d97 Format code (#441) 2024-05-14 22:40:46 +08:00
Lianmin Zheng
455c9ccc4a Update readme (#434) 2024-05-13 00:17:02 -07:00
Shannon Shen
04c0b21488 Allow input_ids in the input of the /generate endpoint (#363) 2024-05-12 15:29:00 -07:00
Liangsheng Yin
14522e6a26 Organize Benchmark (#381) 2024-05-05 16:14:17 +08:00
Liangsheng Yin
95c4e0dfac Format Benchmark Code (#399) 2024-04-28 21:06:22 +08:00
Liangsheng Yin
da19434c2f Benchmark Updates (#382) 2024-04-24 02:23:01 +08:00
Lianmin Zheng
e2b2f0a213 Support oai in benchmark/mmlu (#323) 2024-03-22 13:37:57 -07:00
Liangsheng Yin
ec90b9c054 Upload agent_calls.jsonl download link (#226) 2024-02-24 19:03:46 +08:00
Lianmin Zheng
8ff870bf3e improve docs 2024-02-05 11:29:08 +00:00
Liangsheng Yin
26f0bedc8f jump-forward rename (#144) 2024-02-05 16:50:37 +08:00
hnyls2002
9c121f2a45 minor fix: result dump format 2024-02-02 09:58:24 +00:00
Liangsheng Yin
79cb018e4b Add city doc benchmark mode (#129) 2024-02-01 13:38:47 +08:00
Lianmin Zheng
97aa9b3284 Improve docs & Add JSON decode example (#121) 2024-01-30 05:45:27 -08:00
Liangsheng Yin
01ee0fbc05 fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
2024-01-25 01:16:25 +08:00
Lianmin Zheng
711d343530 add a batch llava example 2024-01-24 11:52:10 +00:00
Lianmin Zheng
b240f75100 Add a parallel sampling case (#34) 2024-01-18 06:29:43 +00:00
Lianmin Zheng
70359bf31a Update benchmark scripts (#8) 2024-01-15 16:12:57 -08:00
Lianmin Zheng
4bd8233f2c Fix test cases (#6) 2024-01-15 01:15:53 -08:00
Liangsheng Yin
08ab2a1655 Json Decode && Mutl-Turns (#4) 2024-01-15 00:49:29 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00