Lianmin Zheng
|
63fbef9876
|
fix flashinfer & http log level
|
2024-07-03 23:19:33 -07:00 |
|
Lianmin Zheng
|
c7709d3abe
|
Update install commands (#583)
|
2024-07-03 02:10:59 -07:00 |
|
Ying Sheng
|
9380f50ff9
|
Turn on flashinfer by default (#578)
|
2024-07-02 02:25:07 -07:00 |
|
Lianmin Zheng
|
badf3fa020
|
Expose dtype argument (#569)
|
2024-06-27 23:30:39 -07:00 |
|
Lianmin Zheng
|
2187f36237
|
Add a new arguments log_level_http to control the HTTP logging (#563)
|
2024-06-25 01:16:20 -07:00 |
|
Ying Sheng
|
09593e9bc9
|
Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-06-17 20:41:24 -07:00 |
|
Lianmin Zheng
|
159cc741e4
|
Make the server random by default (#493)
|
2024-05-31 23:33:34 -07:00 |
|
Ying Sheng
|
83525a1df2
|
Revert "Make the server random by default" (#492)
|
2024-05-31 12:00:21 -07:00 |
|
Lianmin Zheng
|
80a33ce8b0
|
Do not set the default value of global random seed (#488)
|
2024-05-29 18:41:18 -04:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Lianmin Zheng
|
55c1643627
|
Improve benchmark scripts & rename some scripts (#477)
|
2024-05-26 12:51:45 -07:00 |
|
Lianmin Zheng
|
0fafc5606b
|
port fp8 mixtral (#460)
|
2024-05-21 11:46:35 -07:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Liangsheng Yin
|
39191c8515
|
Cache optimizations (#418)
|
2024-05-13 12:47:13 +08:00 |
|
Lianmin Zheng
|
3fc97f6709
|
Move openai api server into a separate file (#429)
|
2024-05-12 06:41:32 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Liangsheng Yin
|
62b3812b69
|
Time cost utils (#355)
|
2024-04-09 23:27:31 +08:00 |
|
Alessio Dalla Piazza
|
d5ae2ebaa2
|
Add Support for API Key Authentication (#230)
|
2024-03-11 05:16:10 -07:00 |
|
Liangsheng Yin
|
1b35547927
|
Organize server_args (#277)
|
2024-03-11 20:06:52 +08:00 |
|
Lianmin Zheng
|
faba293a0d
|
Improve gemma and documentations (#278)
|
2024-03-11 04:43:39 -07:00 |
|
Liangsheng Yin
|
89885b31ef
|
Gemma Support (#256)
|
2024-03-11 12:14:27 +08:00 |
|
psych0v0yager
|
9de9a46815
|
Added the ability to Modify the Context Length (#210)
|
2024-02-20 16:22:56 -08:00 |
|
Liangsheng Yin
|
b1a3a454ee
|
add --disable-disk-cache (#160)
Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>
|
2024-02-08 00:50:12 +08:00 |
|
Lianmin Zheng
|
23f05005fd
|
Format code & move functions (#155)
|
2024-02-06 13:27:46 -08:00 |
|
Liangsheng Yin
|
26f0bedc8f
|
jump-forward rename (#144)
|
2024-02-05 16:50:37 +08:00 |
|
Ying Sheng
|
e095b16236
|
Add max_prefill_num_token into server arguments (#133)
|
2024-02-03 02:35:54 -08:00 |
|
Jay Zhou
|
4a634cf646
|
[Feature] Allow specifying all ports to use in advance (#116)
|
2024-01-30 08:34:51 -08:00 |
|
Lianmin Zheng
|
6f560c761b
|
Improve the control of streaming and improve the first token latency in streaming (#117)
|
2024-01-29 17:05:42 -08:00 |
|
Liangsheng Yin
|
01ee0fbc05
|
fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
|
2024-01-25 01:16:25 +08:00 |
|
Liangsheng Yin
|
40ab1f0129
|
Fix the possible bug of decode out of memory (#36)
|
2024-01-19 11:01:15 -08:00 |
|
Cody Yu
|
23471f9aa3
|
Support v1/chat/completions (#50)
|
2024-01-18 23:43:09 -08:00 |
|
Lianmin Zheng
|
22ec7bc2a1
|
Expose more arguments to control the scheduling policy (#32)
|
2024-01-17 18:37:02 -08:00 |
|
Lianmin Zheng
|
8024fc5eec
|
Fix streaming (#30)
|
2024-01-17 16:38:20 -08:00 |
|
Lianmin Zheng
|
f9d723816a
|
Teak mem fraction (#20)
|
2024-01-17 04:43:17 -08:00 |
|
Lianmin Zheng
|
bf51ddc6e5
|
Improve docs & Rename Gemini -> VertexAI (#19)
|
2024-01-17 02:54:41 -08:00 |
|
Lianmin Zheng
|
70359bf31a
|
Update benchmark scripts (#8)
|
2024-01-15 16:12:57 -08:00 |
|
Lianmin Zheng
|
4bd8233f2c
|
Fix test cases (#6)
|
2024-01-15 01:15:53 -08:00 |
|
Liangsheng Yin
|
08ab2a1655
|
Json Decode && Mutl-Turns (#4)
|
2024-01-15 00:49:29 -08:00 |
|
Lianmin Zheng
|
22085081bb
|
release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-01-08 04:37:50 +00:00 |
|