Commit Graph

32 Commits

Author SHA1 Message Date
SangBin Cho
9208618b3e [Core] in batch prefix caching by delay scheduling (#2442) 2024-12-11 12:51:50 -08:00
Yineng Zhang
75ae968959 minor: update killall script (#2391) 2024-12-08 04:21:00 +08:00
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Iñaki Arango
1363b51983 Escape backwards slash (#1902) 2024-11-03 12:27:11 -08:00
geeker-smallwhite
8ce202a493 delete unused character (#1855) 2024-10-31 19:33:55 +08:00
Lianmin Zheng
b548801ddb Update docs (#1839) 2024-10-30 02:49:08 -07:00
Chayenne
539df95d2c Imporve openai api documents (#1827)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
2024-10-30 00:39:41 -07:00
Chayenne
ced362f7c6 Simplify our docs with complicated functions into utils (#1807)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
2024-10-26 17:44:11 +00:00
Lianmin Zheng
e4d68afcf0 [Minor] Many cleanup (#1357) 2024-09-09 04:14:11 -07:00
Lianmin Zheng
1e495e0847 [Fix] Fix select by ensuring each request has at least one token (#1318) 2024-09-03 06:31:45 -07:00
Ying Sheng
9f662501a3 Move torch.compile configs into cuda_graph_runner.py (#993) 2024-08-08 13:20:30 -07:00
Ying Sheng
0d4f3a9fcd Make API Key OpenAI-compatible (#917) 2024-08-04 13:35:44 -07:00
Ying Sheng
995af5a54b Improve the structure of CI (#911) 2024-08-03 23:09:21 -07:00
Ying Sheng
79f816292e Fix lazy import location (#795) 2024-07-28 22:09:50 -07:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
Lianmin Zheng
2cea6146d8 Improve logging & add logit cap (#471) 2024-05-24 03:48:53 -07:00
Lianmin Zheng
19d2135cb8 Use model loader from vllm (#459) 2024-05-21 09:13:37 -07:00
Lianmin Zheng
8210ec60f4 Improve error handling & abort disconnected requests (#449) 2024-05-17 05:49:31 -07:00
Liangsheng Yin
690d162d97 Format code (#441) 2024-05-14 22:40:46 +08:00
Yuanhan Zhang
0992d85f92 support llava video (#426) 2024-05-13 16:57:00 -07:00
Lianmin Zheng
562b8857d8 Improve error handling (#433) 2024-05-12 20:49:04 -07:00
Lianmin Zheng
13662fd533 Fix RuntimeEndpoint (#279) 2024-03-11 05:24:24 -07:00
Alessio Dalla Piazza
d5ae2ebaa2 Add Support for API Key Authentication (#230) 2024-03-11 05:16:10 -07:00
Lianmin Zheng
faba293a0d Improve gemma and documentations (#278) 2024-03-11 04:43:39 -07:00
Srinivas Billa
01b07ea3ac Add SSL Cert Functionality (#224) 2024-03-03 17:41:41 +08:00
Lianmin Zheng
c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) 2024-02-11 05:50:13 -08:00
Ying Sheng
a6aa46dd3f minor 2024-02-08 04:35:25 +00:00
Srinivas Billa
405f26b00b Add Auth Token to RuntimeEndpoint (#162) 2024-02-07 20:07:31 -08:00
Haotian Liu
d3fc86a43e Improve Chinese character streaming when the last char is half Chinese word. (#95) 2024-01-24 12:23:27 -08:00
Liangsheng Yin
08ab2a1655 Json Decode && Mutl-Turns (#4) 2024-01-15 00:49:29 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00