Lianmin Zheng
|
3bc01ac137
|
[Minor] improve code style
|
2024-06-03 18:11:34 -07:00 |
|
Lianmin Zheng
|
159cc741e4
|
Make the server random by default (#493)
|
2024-05-31 23:33:34 -07:00 |
|
Ying Sheng
|
83525a1df2
|
Revert "Make the server random by default" (#492)
|
2024-05-31 12:00:21 -07:00 |
|
Lianmin Zheng
|
80a33ce8b0
|
Do not set the default value of global random seed (#488)
|
2024-05-29 18:41:18 -04:00 |
|
Lianmin Zheng
|
1a57e41679
|
do not launch workers in parallel
|
2024-05-27 23:00:16 -07:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Lianmin Zheng
|
565d727409
|
improve logging & fix vllm version
|
2024-05-27 15:04:23 -07:00 |
|
Lianmin Zheng
|
09de730dee
|
Improve benchmark scripts & add more models (#484)
|
2024-05-27 14:13:26 -07:00 |
|
Lianmin Zheng
|
55c1643627
|
Improve benchmark scripts & rename some scripts (#477)
|
2024-05-26 12:51:45 -07:00 |
|
Li Bo
|
2b605ab1d7
|
[Feat/Fix] Refactoring Llava models into single file (#475)
|
2024-05-26 12:29:51 -07:00 |
|
Liangsheng Yin
|
f06e90c2cf
|
Optimize retract (#440)
|
2024-05-26 00:07:26 +08:00 |
|
Lianmin Zheng
|
2cea6146d8
|
Improve logging & add logit cap (#471)
|
2024-05-24 03:48:53 -07:00 |
|
Lianmin Zheng
|
0fafc5606b
|
port fp8 mixtral (#460)
|
2024-05-21 11:46:35 -07:00 |
|
Lianmin Zheng
|
19d2135cb8
|
Use model loader from vllm (#459)
|
2024-05-21 09:13:37 -07:00 |
|
Lianmin Zheng
|
ced77c6626
|
Rename api_num_spec_tokens -> num_api_spec_tokens (#458)
|
2024-05-20 18:44:23 -07:00 |
|
Lianmin Zheng
|
8dbdc018a3
|
Abort disconnected requests (#457)
|
2024-05-20 18:41:21 -07:00 |
|
Ying Sheng
|
3e684be7a3
|
Fix openai speculative execution (#456)
|
2024-05-20 17:01:13 -07:00 |
|
LiviaSun
|
ec380dfd30
|
openai chat speculative execution (#250)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-05-18 22:23:53 -07:00 |
|
Liangsheng Yin
|
5b647543c1
|
Fix the broken --disable-radix-cache (#451)
|
2024-05-19 13:00:12 +08:00 |
|
Lianmin Zheng
|
8210ec60f4
|
Improve error handling & abort disconnected requests (#449)
|
2024-05-17 05:49:31 -07:00 |
|
Ying Sheng
|
5be9eb8a8c
|
Add PUT for generate api (#448)
|
2024-05-17 02:35:15 -07:00 |
|
Lianmin Zheng
|
c05956e534
|
Simplify port allocation (#447)
|
2024-05-16 18:07:30 -07:00 |
|
Matthias Gerstgrasser
|
d75dc20fae
|
Add finish_reason to OpenAI API (#446)
|
2024-05-16 14:55:05 -07:00 |
|
Liangsheng Yin
|
690d162d97
|
Format code (#441)
|
2024-05-14 22:40:46 +08:00 |
|
Kaichen Zhang - NTU
|
664287b2a7
|
[Feat] Add llava qwen, llava mistral (#419)
Co-authored-by: Bo Li <drluodian@gmail.com>
|
2024-05-13 22:17:50 -07:00 |
|
Lianmin Zheng
|
e0ae5d42ec
|
Update version to 0.1.16 (#438)
|
2024-05-13 17:29:17 -07:00 |
|
Lianmin Zheng
|
32de16ce2f
|
Fix streaming (#437)
|
2024-05-13 17:26:18 -07:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Lianmin Zheng
|
5dc55a5f02
|
Handle truncation errors (#436)
|
2024-05-13 15:56:00 -07:00 |
|
Lianmin Zheng
|
4231a42fa8
|
Fix import of global_config
|
2024-05-13 12:11:55 -07:00 |
|
Liangsheng Yin
|
39191c8515
|
Cache optimizations (#418)
|
2024-05-13 12:47:13 +08:00 |
|
Lianmin Zheng
|
562b8857d8
|
Improve error handling (#433)
|
2024-05-12 20:49:04 -07:00 |
|
Shannon Shen
|
04c0b21488
|
Allow input_ids in the input of the /generate endpoint (#363)
|
2024-05-12 15:29:00 -07:00 |
|
Lianmin Zheng
|
6e09cf6a15
|
Misc fixes (#432)
|
2024-05-12 15:05:40 -07:00 |
|
Lianmin Zheng
|
72bb344388
|
Update version to 0.1.15 (#431)
|
2024-05-12 14:22:33 -07:00 |
|
Lianmin Zheng
|
2d580e7a89
|
Fix flashinfer (#430)
|
2024-05-12 08:18:53 -07:00 |
|
Lianmin Zheng
|
3fc97f6709
|
Move openai api server into a separate file (#429)
|
2024-05-12 06:41:32 -07:00 |
|
Lianmin Zheng
|
abc548c707
|
Minor fix for the import path (#428)
|
2024-05-12 05:10:35 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Lianmin Zheng
|
09deb20dee
|
Optimize the memory usage of logits processor (#420)
|
2024-05-11 16:56:42 -07:00 |
|
Qubitium
|
33b242df30
|
Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
|
2024-05-11 16:37:49 -07:00 |
|
Lianmin Zheng
|
a511a2d089
|
restrict vllm version
|
2024-05-09 15:49:29 -07:00 |
|
Liangsheng Yin
|
6ec65f4555
|
Make public APIs more standard. (#416)
|
2024-05-09 15:39:22 +08:00 |
|
Enrique Shockwave
|
e2c31fca5c
|
Include finish reason in meta info response (#415)
|
2024-05-09 15:14:01 +08:00 |
|
Liangsheng Yin
|
d5de20a3ee
|
Fix sync() when fork(1) (#412)
|
2024-05-08 15:15:18 +08:00 |
|
YoungJoong Noah Kim
|
4a1c6ae2ce
|
Add Cohere Command R chat template (#411)
|
2024-05-07 15:18:15 +08:00 |
|
Liangsheng Yin
|
14522e6a26
|
Organize Benchmark (#381)
|
2024-05-05 16:14:17 +08:00 |
|
ZhouXingg
|
183df47282
|
SamplingParams add "spaces_between_special_tokens" argument (#392)
|
2024-04-30 16:17:12 -07:00 |
|
Joschka Braun
|
5c5aba5900
|
Adding RAG tracing & eval cookbook using Parea (#390)
|
2024-04-30 16:13:28 -07:00 |
|