Ying Sheng
|
1374334d38
|
Fix dependency & crash issues (#539)
|
2024-06-12 21:23:19 -07:00 |
|
Lianmin Zheng
|
94aead9e8d
|
Fix dependency (#538)
|
2024-06-12 13:17:35 -07:00 |
|
Liangsheng Yin
|
9c902b1954
|
Decode Incrementally (#517)
|
2024-06-11 23:39:12 -07:00 |
|
ZhouXingg
|
111991fe23
|
Fix Regression: Disable p2p for 4090 (#531)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
|
2024-06-11 23:27:17 -07:00 |
|
Qubitium
|
a8c787d2b3
|
Add ChatGLM Model Support (#516)
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-11 16:39:52 -07:00 |
|
Fabian Preiß
|
5f283991e9
|
[Minor] Correct Optional type hints in api (#526)
|
2024-06-11 16:37:27 -07:00 |
|
Fabian Preiß
|
542bc733d6
|
Fix missing numpy dependency in pyproject.toml (#524)
|
2024-06-10 12:13:50 -07:00 |
|
Lianmin Zheng
|
f6dbd24043
|
Improve doc strings (#518)
|
2024-06-08 02:39:32 -07:00 |
|
Lianmin Zheng
|
e8a2327d52
|
Update version to 0.1.17 (#515)
|
2024-06-07 19:49:18 -07:00 |
|
Lianmin Zheng
|
91f93f141f
|
Crash the server when error or OOM happens (#514)
|
2024-06-07 19:22:34 -07:00 |
|
Qubitium
|
f70f72586a
|
Fix rid state map leak + Refractor .finished (#505)
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 13:20:40 -07:00 |
|
Lianmin Zheng
|
c0ae70c8ed
|
Improve logging & fix litellm dependency. (#512)
|
2024-06-07 13:10:32 -07:00 |
|
胡译文
|
87260b7bfd
|
Litellm Backend (#502)
|
2024-06-07 12:24:28 -07:00 |
|
Amos You
|
651a23ee7c
|
remove redundant pad_input_ids function (#500)
|
2024-06-07 12:23:29 -07:00 |
|
Lianmin Zheng
|
bf3e271fe0
|
Update vllm to v0.4.3 (#511)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 12:11:31 -07:00 |
|
Lianmin Zheng
|
3bc01ac137
|
[Minor] improve code style
|
2024-06-03 18:11:34 -07:00 |
|
Lianmin Zheng
|
159cc741e4
|
Make the server random by default (#493)
|
2024-05-31 23:33:34 -07:00 |
|
Ying Sheng
|
83525a1df2
|
Revert "Make the server random by default" (#492)
|
2024-05-31 12:00:21 -07:00 |
|
Lianmin Zheng
|
80a33ce8b0
|
Do not set the default value of global random seed (#488)
|
2024-05-29 18:41:18 -04:00 |
|
Lianmin Zheng
|
1a57e41679
|
do not launch workers in parallel
|
2024-05-27 23:00:16 -07:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Lianmin Zheng
|
565d727409
|
improve logging & fix vllm version
|
2024-05-27 15:04:23 -07:00 |
|
Lianmin Zheng
|
09de730dee
|
Improve benchmark scripts & add more models (#484)
|
2024-05-27 14:13:26 -07:00 |
|
Lianmin Zheng
|
55c1643627
|
Improve benchmark scripts & rename some scripts (#477)
|
2024-05-26 12:51:45 -07:00 |
|
Li Bo
|
2b605ab1d7
|
[Feat/Fix] Refactoring Llava models into single file (#475)
|
2024-05-26 12:29:51 -07:00 |
|
Liangsheng Yin
|
f06e90c2cf
|
Optimize retract (#440)
|
2024-05-26 00:07:26 +08:00 |
|
Lianmin Zheng
|
2cea6146d8
|
Improve logging & add logit cap (#471)
|
2024-05-24 03:48:53 -07:00 |
|
Lianmin Zheng
|
0fafc5606b
|
port fp8 mixtral (#460)
|
2024-05-21 11:46:35 -07:00 |
|
Lianmin Zheng
|
19d2135cb8
|
Use model loader from vllm (#459)
|
2024-05-21 09:13:37 -07:00 |
|
Lianmin Zheng
|
ced77c6626
|
Rename api_num_spec_tokens -> num_api_spec_tokens (#458)
|
2024-05-20 18:44:23 -07:00 |
|
Lianmin Zheng
|
8dbdc018a3
|
Abort disconnected requests (#457)
|
2024-05-20 18:41:21 -07:00 |
|
Ying Sheng
|
3e684be7a3
|
Fix openai speculative execution (#456)
|
2024-05-20 17:01:13 -07:00 |
|
LiviaSun
|
ec380dfd30
|
openai chat speculative execution (#250)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-05-18 22:23:53 -07:00 |
|
Liangsheng Yin
|
5b647543c1
|
Fix the broken --disable-radix-cache (#451)
|
2024-05-19 13:00:12 +08:00 |
|
Lianmin Zheng
|
8210ec60f4
|
Improve error handling & abort disconnected requests (#449)
|
2024-05-17 05:49:31 -07:00 |
|
Ying Sheng
|
5be9eb8a8c
|
Add PUT for generate api (#448)
|
2024-05-17 02:35:15 -07:00 |
|
Lianmin Zheng
|
c05956e534
|
Simplify port allocation (#447)
|
2024-05-16 18:07:30 -07:00 |
|
Matthias Gerstgrasser
|
d75dc20fae
|
Add finish_reason to OpenAI API (#446)
|
2024-05-16 14:55:05 -07:00 |
|
Liangsheng Yin
|
690d162d97
|
Format code (#441)
|
2024-05-14 22:40:46 +08:00 |
|
Kaichen Zhang - NTU
|
664287b2a7
|
[Feat] Add llava qwen, llava mistral (#419)
Co-authored-by: Bo Li <drluodian@gmail.com>
|
2024-05-13 22:17:50 -07:00 |
|
Lianmin Zheng
|
e0ae5d42ec
|
Update version to 0.1.16 (#438)
|
2024-05-13 17:29:17 -07:00 |
|
Lianmin Zheng
|
32de16ce2f
|
Fix streaming (#437)
|
2024-05-13 17:26:18 -07:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Lianmin Zheng
|
5dc55a5f02
|
Handle truncation errors (#436)
|
2024-05-13 15:56:00 -07:00 |
|
Lianmin Zheng
|
4231a42fa8
|
Fix import of global_config
|
2024-05-13 12:11:55 -07:00 |
|
Liangsheng Yin
|
39191c8515
|
Cache optimizations (#418)
|
2024-05-13 12:47:13 +08:00 |
|
Lianmin Zheng
|
562b8857d8
|
Improve error handling (#433)
|
2024-05-12 20:49:04 -07:00 |
|
Shannon Shen
|
04c0b21488
|
Allow input_ids in the input of the /generate endpoint (#363)
|
2024-05-12 15:29:00 -07:00 |
|
Lianmin Zheng
|
6e09cf6a15
|
Misc fixes (#432)
|
2024-05-12 15:05:40 -07:00 |
|
Lianmin Zheng
|
72bb344388
|
Update version to 0.1.15 (#431)
|
2024-05-12 14:22:33 -07:00 |
|