Lianmin Zheng
|
6f560c761b
|
Improve the control of streaming and improve the first token latency in streaming (#117)
|
2024-01-29 17:05:42 -08:00 |
|
Cody Yu
|
cd6872334e
|
Fix Mistral model loading (#108)
Co-authored-by: johndun <dunavent.jm@gmail.com>
|
2024-01-26 09:38:43 -08:00 |
|
Liangsheng Yin
|
81561f8e2d
|
Flush Cache API (#103)
|
2024-01-25 21:32:59 -08:00 |
|
Cody Yu
|
3a581e9949
|
Dynamic model class loading (#101)
|
2024-01-25 15:29:07 -08:00 |
|
shiyi.c_98
|
0147f940dd
|
fix batch error for llava-hd (#98)
|
2024-01-25 07:56:25 -08:00 |
|
parasol-aser
|
23950056f0
|
support speculative execution for openai API (#48)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-01-25 01:57:06 -08:00 |
|
isaac-vidas
|
0c457bae8f
|
Handle grayscale images in expand2square (#97)
|
2024-01-24 16:23:11 -08:00 |
|
Haotian Liu
|
d3fc86a43e
|
Improve Chinese character streaming when the last char is half Chinese word. (#95)
|
2024-01-24 12:23:27 -08:00 |
|
Liangsheng Yin
|
01ee0fbc05
|
fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
|
2024-01-25 01:16:25 +08:00 |
|
Lianmin Zheng
|
6dceab4d17
|
bump version to 0.1.9
|
2024-01-24 11:37:25 +00:00 |
|
Lianmin Zheng
|
c70b3cfa9e
|
Bump the version to v0.1.8 (#93)
|
2024-01-24 03:33:34 -08:00 |
|
Ying Sheng
|
489796c7ea
|
minor performance fix
|
2024-01-24 10:45:44 +00:00 |
|
Lianmin Zheng
|
fa7a696d04
|
Fix max_new_tokens for limited memory
|
2024-01-24 10:44:32 +00:00 |
|
Lianmin Zheng
|
bef0b35902
|
Fix llava & Fix multiprocessing
|
2024-01-24 10:35:31 +00:00 |
|
shiyi.c_98
|
c6576e820c
|
Llava-hd Support (#92)
Co-authored-by: Haotian Liu <liuhaotian.cn@gmail.com>
|
2024-01-24 01:51:21 -08:00 |
|
Lianmin Zheng
|
99258181c6
|
set start method to spawn
|
2024-01-24 08:55:38 +00:00 |
|
isaac-vidas
|
3de54a1b55
|
Add health endpoint to SGLang runtime server (#90)
|
2024-01-23 19:00:28 -08:00 |
|
Lianmin Zheng
|
7358fa64f7
|
Fix a bug in runtime backend
|
2024-01-23 22:10:17 +00:00 |
|
Lianmin Zheng
|
9a16fea012
|
Return logprob for choices (#87)
|
2024-01-23 05:07:30 -08:00 |
|
Lianmin Zheng
|
959c4174b2
|
Fix the chat template for QWen (#83)
|
2024-01-22 21:46:47 -08:00 |
|
Lianmin Zheng
|
94e05770db
|
Fix after QWen support (#82)
|
2024-01-22 21:17:05 -08:00 |
|
Arcmoon
|
63e97e5e4c
|
Suppport qwen model and solve some problems (#75)
|
2024-01-22 20:14:51 -08:00 |
|
isaac-vidas
|
e08bca2840
|
Support load fine-tuned LLaVA model (#80)
|
2024-01-22 18:15:48 -08:00 |
|
Ying Sheng
|
3f5c2f4c4a
|
Add an async example (#37)
|
2024-01-21 15:17:30 -08:00 |
|
Lianmin Zheng
|
007eeb4eb9
|
Fix the error message and dependency of openai backend (#71)
|
2024-01-21 14:56:25 -08:00 |
|
Lianmin Zheng
|
723f042163
|
release v0.1.7 & fix bugs
|
2024-01-21 10:31:02 +00:00 |
|
Lianmin Zheng
|
585eababa1
|
Improve error message of openai
|
2024-01-21 10:13:45 +00:00 |
|
Lianmin Zheng
|
cc3ada983f
|
Bump version to 0.1.6 (#68)
|
2024-01-21 01:45:02 -08:00 |
|
Lianmin Zheng
|
a837166e6f
|
Fix select and normalized logprobs (#67)
|
2024-01-21 01:39:23 -08:00 |
|
Lianmin Zheng
|
11f3cca64f
|
Fix select (#64)
|
2024-01-20 23:20:35 -08:00 |
|
Liangsheng Yin
|
ca13f3b8c5
|
Disk FSM cache and adjust code. (#63)
|
2024-01-20 21:26:11 -08:00 |
|
Lianmin Zheng
|
f30abd090a
|
Improve error message & Add vicuna template (#57)
|
2024-01-19 17:03:33 -08:00 |
|
Liangsheng Yin
|
40ab1f0129
|
Fix the possible bug of decode out of memory (#36)
|
2024-01-19 11:01:15 -08:00 |
|
Lianmin Zheng
|
199e82a15d
|
Format code & Improve readme (#52)
|
2024-01-18 23:51:19 -08:00 |
|
Cody Yu
|
23471f9aa3
|
Support v1/chat/completions (#50)
|
2024-01-18 23:43:09 -08:00 |
|
Cody Yu
|
61d4c93962
|
Support stream=True in v1/completions (#49)
|
2024-01-18 17:00:56 -08:00 |
|
Lianmin Zheng
|
2b079f8931
|
Increase interpreter parallelism (#46)
|
2024-01-18 13:30:10 -08:00 |
|
Lianmin Zheng
|
b240f75100
|
Add a parallel sampling case (#34)
|
2024-01-18 06:29:43 +00:00 |
|
Lianmin Zheng
|
501f944445
|
Bump version to 0.1.5 (#33)
|
2024-01-17 21:14:31 -08:00 |
|
Lianmin Zheng
|
22ec7bc2a1
|
Expose more arguments to control the scheduling policy (#32)
|
2024-01-17 18:37:02 -08:00 |
|
Christopher Chou
|
c0454b323c
|
Add option to return metadata in async streaming (#18)
|
2024-01-17 18:15:02 -08:00 |
|
Lianmin Zheng
|
8024fc5eec
|
Fix streaming (#30)
|
2024-01-17 16:38:20 -08:00 |
|
Lianmin Zheng
|
f9d723816a
|
Teak mem fraction (#20)
|
2024-01-17 04:43:17 -08:00 |
|
Lianmin Zheng
|
bf51ddc6e5
|
Improve docs & Rename Gemini -> VertexAI (#19)
|
2024-01-17 02:54:41 -08:00 |
|
shiyi.c_98
|
fd7c479239
|
Gemini Backend (#9)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-01-16 22:29:37 -08:00 |
|
Lianmin Zheng
|
c4707f1bb5
|
Improve docs (#17)
|
2024-01-16 19:53:55 -08:00 |
|
Ying Sheng
|
ffe4aaee1d
|
Fix for T4 GPUs (#16)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-01-16 15:49:03 -08:00 |
|
Christopher Chou
|
5b27a1dce4
|
Rename image_url to image_file (#15)
|
2024-01-16 15:41:30 -08:00 |
|
Lianmin Zheng
|
2ccd9fd8c5
|
update version to 0.1.3
|
2024-01-16 05:55:25 +00:00 |
|
Lianmin Zheng
|
70359bf31a
|
Update benchmark scripts (#8)
|
2024-01-15 16:12:57 -08:00 |
|