Commit Graph

60 Commits

Author SHA1 Message Date
Lianmin Zheng
bef0b35902 Fix llava & Fix multiprocessing 2024-01-24 10:35:31 +00:00
shiyi.c_98
c6576e820c Llava-hd Support (#92)
Co-authored-by: Haotian Liu <liuhaotian.cn@gmail.com>
2024-01-24 01:51:21 -08:00
Lianmin Zheng
99258181c6 set start method to spawn 2024-01-24 08:55:38 +00:00
isaac-vidas
3de54a1b55 Add health endpoint to SGLang runtime server (#90) 2024-01-23 19:00:28 -08:00
Lianmin Zheng
7358fa64f7 Fix a bug in runtime backend 2024-01-23 22:10:17 +00:00
Lianmin Zheng
9a16fea012 Return logprob for choices (#87) 2024-01-23 05:07:30 -08:00
Lianmin Zheng
9e037c822c Update README.md 2024-01-23 03:43:19 -08:00
0xWe11es.eth
9076386d90 Fix SRT endpoint api json syntax (#84) 2024-01-23 00:25:26 -08:00
Lianmin Zheng
959c4174b2 Fix the chat template for QWen (#83) 2024-01-22 21:46:47 -08:00
Lianmin Zheng
94e05770db Fix after QWen support (#82) 2024-01-22 21:17:05 -08:00
Arcmoon
63e97e5e4c Suppport qwen model and solve some problems (#75) 2024-01-22 20:14:51 -08:00
isaac-vidas
e08bca2840 Support load fine-tuned LLaVA model (#80) 2024-01-22 18:15:48 -08:00
Lianmin Zheng
cd3ccb2ed7 Add a note about triton version for older GPUs (#72) 2024-01-21 16:51:45 -08:00
Ying Sheng
3f5c2f4c4a Add an async example (#37) 2024-01-21 15:17:30 -08:00
Lianmin Zheng
007eeb4eb9 Fix the error message and dependency of openai backend (#71) 2024-01-21 14:56:25 -08:00
Ying Sheng
e8f2b155fe Update README.md 2024-01-21 02:45:58 -08:00
Lianmin Zheng
723f042163 release v0.1.7 & fix bugs 2024-01-21 10:31:02 +00:00
Lianmin Zheng
585eababa1 Improve error message of openai 2024-01-21 10:13:45 +00:00
Lianmin Zheng
cc3ada983f Bump version to 0.1.6 (#68) 2024-01-21 01:45:02 -08:00
Lianmin Zheng
a837166e6f Fix select and normalized logprobs (#67) 2024-01-21 01:39:23 -08:00
Lianmin Zheng
11f3cca64f Fix select (#64) 2024-01-20 23:20:35 -08:00
Liangsheng Yin
ca13f3b8c5 Disk FSM cache and adjust code. (#63) 2024-01-20 21:26:11 -08:00
Ikko Eltociear Ashimine
0b2efc2adc Update README.md (#58) 2024-01-19 21:00:29 -08:00
Lianmin Zheng
f30abd090a Improve error message & Add vicuna template (#57) 2024-01-19 17:03:33 -08:00
Liangsheng Yin
40ab1f0129 Fix the possible bug of decode out of memory (#36) 2024-01-19 11:01:15 -08:00
Lianmin Zheng
199e82a15d Format code & Improve readme (#52) 2024-01-18 23:51:19 -08:00
Cody Yu
23471f9aa3 Support v1/chat/completions (#50) 2024-01-18 23:43:09 -08:00
Cody Yu
61d4c93962 Support stream=True in v1/completions (#49) 2024-01-18 17:00:56 -08:00
Lianmin Zheng
98a3e8ef78 Add a llava example (#47) 2024-01-18 13:46:38 -08:00
Lianmin Zheng
2b079f8931 Increase interpreter parallelism (#46) 2024-01-18 13:30:10 -08:00
Lianmin Zheng
05b4c398df Document sampling parameters (#45) 2024-01-18 11:49:27 -08:00
Cody Yu
dafafe5b11 Use HTTP link in 3rdparty module (#42) 2024-01-18 11:18:22 -08:00
Lianmin Zheng
b240f75100 Add a parallel sampling case (#34) 2024-01-18 06:29:43 +00:00
Lianmin Zheng
501f944445 Bump version to 0.1.5 (#33) 2024-01-17 21:14:31 -08:00
Lianmin Zheng
22ec7bc2a1 Expose more arguments to control the scheduling policy (#32) 2024-01-17 18:37:02 -08:00
Christopher Chou
c0454b323c Add option to return metadata in async streaming (#18) 2024-01-17 18:15:02 -08:00
Lianmin Zheng
8024fc5eec Fix streaming (#30) 2024-01-17 16:38:20 -08:00
Lianmin Zheng
70528762bf update readme 2024-01-17 10:42:55 -08:00
Ying Sheng
71d30d6ddc Update README.md 2024-01-17 09:49:53 -08:00
Lianmin Zheng
f9d723816a Teak mem fraction (#20) 2024-01-17 04:43:17 -08:00
Lianmin Zheng
bf51ddc6e5 Improve docs & Rename Gemini -> VertexAI (#19) 2024-01-17 02:54:41 -08:00
shiyi.c_98
fd7c479239 Gemini Backend (#9)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-01-16 22:29:37 -08:00
Lianmin Zheng
c4707f1bb5 Improve docs (#17) 2024-01-16 19:53:55 -08:00
Ying Sheng
ffe4aaee1d Fix for T4 GPUs (#16)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-01-16 15:49:03 -08:00
Christopher Chou
5b27a1dce4 Rename image_url to image_file (#15) 2024-01-16 15:41:30 -08:00
Lianmin Zheng
e71d4ab3f9 Update docs (#12) 2024-01-16 06:00:48 -08:00
Lianmin Zheng
fbf42263f1 Update Readme (#11) 2024-01-16 10:48:12 +00:00
Lianmin Zheng
2ccd9fd8c5 update version to 0.1.3 2024-01-16 05:55:25 +00:00
Lianmin Zheng
46b7ea7c85 Improve Readme (#10) 2024-01-16 05:53:06 +00:00
Lianmin Zheng
70359bf31a Update benchmark scripts (#8) 2024-01-15 16:12:57 -08:00