Commit Graph

30 Commits

Author SHA1 Message Date
zhyncs
2e341cd493 misc: add pre-commit config (#637) 2024-07-17 11:55:39 -07:00
Lianmin Zheng
41d1f67704 Fix flush cache (#627) 2024-07-15 20:44:04 -07:00
Ying Sheng
6a2941f4d0 Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
2024-07-15 07:10:51 -07:00
Mingyi
5ac8b80677 Simplify mem state (#623) 2024-07-15 02:01:09 -07:00
Ying Sheng
bae9541e4c Update benchmark script (#621) 2024-07-14 21:38:53 +00:00
Liangsheng Yin
564a898ad9 Optimize mem indices mangement (#619) 2024-07-13 23:39:37 -07:00
Lianmin Zheng
0feca02dd9 Improve benchmark scripts (#615) 2024-07-13 15:59:04 -07:00
Lianmin Zheng
65c6577696 Improve benchmark scripts & fix llava (#613) 2024-07-13 15:00:26 -07:00
Lianmin Zheng
665815969a Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00
Ying Sheng
dc1b8bcfaa Format (#593) 2024-07-05 10:06:17 -07:00
sglang
11616fc6bd Minor fix in compiler & format (#545) 2024-06-29 23:42:14 -07:00
Lianmin Zheng
945aa9beb2 Update readme (#568) 2024-06-27 11:37:49 -07:00
Lianmin Zheng
2e6e62e156 Increase the number of thread limitation for tp worker managers. (#567) 2024-06-26 09:33:45 -07:00
Liangsheng Yin
92cb93f390 Fix latency benchmark (#557) 2024-06-22 15:11:04 +08:00
Ying Sheng
09593e9bc9 Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-06-17 20:41:24 -07:00
Liangsheng Yin
40e53d65cb Add disk cache for loading ShareGPT dataset. (#542) 2024-06-13 16:37:12 +08:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
Ying Sheng
1374334d38 Fix dependency & crash issues (#539) 2024-06-12 21:23:19 -07:00
Lianmin Zheng
3bc01ac137 [Minor] improve code style 2024-06-03 18:11:34 -07:00
Lianmin Zheng
09de730dee Improve benchmark scripts & add more models (#484) 2024-05-27 14:13:26 -07:00
Lianmin Zheng
55c1643627 Improve benchmark scripts & rename some scripts (#477) 2024-05-26 12:51:45 -07:00
Ying Sheng
947bda73fe Add benchmark scripts (#476) 2024-05-26 12:09:03 -07:00
Lianmin Zheng
2cea6146d8 Improve logging & add logit cap (#471) 2024-05-24 03:48:53 -07:00
Liangsheng Yin
690d162d97 Format code (#441) 2024-05-14 22:40:46 +08:00
Lianmin Zheng
455c9ccc4a Update readme (#434) 2024-05-13 00:17:02 -07:00
Shannon Shen
04c0b21488 Allow input_ids in the input of the /generate endpoint (#363) 2024-05-12 15:29:00 -07:00
Liangsheng Yin
95c4e0dfac Format Benchmark Code (#399) 2024-04-28 21:06:22 +08:00
Lianmin Zheng
b240f75100 Add a parallel sampling case (#34) 2024-01-18 06:29:43 +00:00
Lianmin Zheng
70359bf31a Update benchmark scripts (#8) 2024-01-15 16:12:57 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00