Commit Graph

40 Commits

Author SHA1 Message Date
Lianmin Zheng
63fbef9876 fix flashinfer & http log level 2024-07-03 23:19:33 -07:00
Lianmin Zheng
c7709d3abe Update install commands (#583) 2024-07-03 02:10:59 -07:00
Ying Sheng
9380f50ff9 Turn on flashinfer by default (#578) 2024-07-02 02:25:07 -07:00
Lianmin Zheng
badf3fa020 Expose dtype argument (#569) 2024-06-27 23:30:39 -07:00
Lianmin Zheng
2187f36237 Add a new arguments log_level_http to control the HTTP logging (#563) 2024-06-25 01:16:20 -07:00
Ying Sheng
09593e9bc9 Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-06-17 20:41:24 -07:00
Lianmin Zheng
159cc741e4 Make the server random by default (#493) 2024-05-31 23:33:34 -07:00
Ying Sheng
83525a1df2 Revert "Make the server random by default" (#492) 2024-05-31 12:00:21 -07:00
Lianmin Zheng
80a33ce8b0 Do not set the default value of global random seed (#488) 2024-05-29 18:41:18 -04:00
Ying Sheng
0463f7fb52 Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2024-05-27 21:24:10 -07:00
Lianmin Zheng
55c1643627 Improve benchmark scripts & rename some scripts (#477) 2024-05-26 12:51:45 -07:00
Lianmin Zheng
0fafc5606b port fp8 mixtral (#460) 2024-05-21 11:46:35 -07:00
Yuanhan Zhang
0992d85f92 support llava video (#426) 2024-05-13 16:57:00 -07:00
Liangsheng Yin
39191c8515 Cache optimizations (#418) 2024-05-13 12:47:13 +08:00
Lianmin Zheng
3fc97f6709 Move openai api server into a separate file (#429) 2024-05-12 06:41:32 -07:00
Lianmin Zheng
aee4f523cf Fix logit processor bugs (#427) 2024-05-12 04:54:07 -07:00
Lianmin Zheng
7023f413c6 Clean up (#422) 2024-05-11 20:55:00 -07:00
Liangsheng Yin
62b3812b69 Time cost utils (#355) 2024-04-09 23:27:31 +08:00
Alessio Dalla Piazza
d5ae2ebaa2 Add Support for API Key Authentication (#230) 2024-03-11 05:16:10 -07:00
Liangsheng Yin
1b35547927 Organize server_args (#277) 2024-03-11 20:06:52 +08:00
Lianmin Zheng
faba293a0d Improve gemma and documentations (#278) 2024-03-11 04:43:39 -07:00
Liangsheng Yin
89885b31ef Gemma Support (#256) 2024-03-11 12:14:27 +08:00
psych0v0yager
9de9a46815 Added the ability to Modify the Context Length (#210) 2024-02-20 16:22:56 -08:00
Liangsheng Yin
b1a3a454ee add --disable-disk-cache (#160)
Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>
2024-02-08 00:50:12 +08:00
Lianmin Zheng
23f05005fd Format code & move functions (#155) 2024-02-06 13:27:46 -08:00
Liangsheng Yin
26f0bedc8f jump-forward rename (#144) 2024-02-05 16:50:37 +08:00
Ying Sheng
e095b16236 Add max_prefill_num_token into server arguments (#133) 2024-02-03 02:35:54 -08:00
Jay Zhou
4a634cf646 [Feature] Allow specifying all ports to use in advance (#116) 2024-01-30 08:34:51 -08:00
Lianmin Zheng
6f560c761b Improve the control of streaming and improve the first token latency in streaming (#117) 2024-01-29 17:05:42 -08:00
Liangsheng Yin
01ee0fbc05 fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
2024-01-25 01:16:25 +08:00
Liangsheng Yin
40ab1f0129 Fix the possible bug of decode out of memory (#36) 2024-01-19 11:01:15 -08:00
Cody Yu
23471f9aa3 Support v1/chat/completions (#50) 2024-01-18 23:43:09 -08:00
Lianmin Zheng
22ec7bc2a1 Expose more arguments to control the scheduling policy (#32) 2024-01-17 18:37:02 -08:00
Lianmin Zheng
8024fc5eec Fix streaming (#30) 2024-01-17 16:38:20 -08:00
Lianmin Zheng
f9d723816a Teak mem fraction (#20) 2024-01-17 04:43:17 -08:00
Lianmin Zheng
bf51ddc6e5 Improve docs & Rename Gemini -> VertexAI (#19) 2024-01-17 02:54:41 -08:00
Lianmin Zheng
70359bf31a Update benchmark scripts (#8) 2024-01-15 16:12:57 -08:00
Lianmin Zheng
4bd8233f2c Fix test cases (#6) 2024-01-15 01:15:53 -08:00
Liangsheng Yin
08ab2a1655 Json Decode && Mutl-Turns (#4) 2024-01-15 00:49:29 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00