Commit Graph

74 Commits

Author SHA1 Message Date
Ying Sheng
f6bfe3aaff Release 0.1.11 (#134) 2024-02-03 02:50:13 -08:00
Ying Sheng
e095b16236 Add max_prefill_num_token into server arguments (#133) 2024-02-03 02:35:54 -08:00
Ying Sheng
67be11c790 fix bug of race condition in copy() 2024-02-03 01:38:00 -08:00
Liangsheng Yin
cd8c3ccd95 Fix is_multimodal_model judge (#132) 2024-02-03 11:48:01 +08:00
Christopher Chou
864425300f Yi-VL Model (#112) 2024-02-01 08:33:22 -08:00
Lianmin Zheng
c7af9f7393 Fix a bug in llava-hd 2024-01-31 18:52:15 +00:00
Lianmin Zheng
ad82bac6f5 Fix model loading & format code (#125) 2024-01-30 23:49:52 -08:00
Cody Yu
71b54eea7d Add cache metrics (#119) 2024-01-30 22:13:14 -08:00
Lianmin Zheng
74b3bfaaf8 format code 2024-01-30 16:36:10 +00:00
Jay Zhou
4a634cf646 [Feature] Allow specifying all ports to use in advance (#116) 2024-01-30 08:34:51 -08:00
Lianmin Zheng
a49dc52bfa release v0.1.10 2024-01-30 15:37:52 +00:00
Lianmin Zheng
873d0e8537 Ignore detokenization error 2024-01-30 14:52:06 +00:00
Keith Stevens
1d0fbe8e43 [Feature] Adds basic support for image content in OpenAI chat routes (#113) 2024-01-30 06:12:33 -08:00
Lianmin Zheng
97aa9b3284 Improve docs & Add JSON decode example (#121) 2024-01-30 05:45:27 -08:00
Lianmin Zheng
0617528632 Update quick start examples (#120) 2024-01-30 04:29:32 -08:00
Lianmin Zheng
4ea92f8307 Format code (#118) 2024-01-29 17:08:12 -08:00
Junyang Lin
6b0af2853c Add qwen2 (#114) 2024-01-29 17:06:02 -08:00
Lianmin Zheng
6f560c761b Improve the control of streaming and improve the first token latency in streaming (#117) 2024-01-29 17:05:42 -08:00
Cody Yu
cd6872334e Fix Mistral model loading (#108)
Co-authored-by: johndun <dunavent.jm@gmail.com>
2024-01-26 09:38:43 -08:00
Liangsheng Yin
81561f8e2d Flush Cache API (#103) 2024-01-25 21:32:59 -08:00
Cody Yu
3a581e9949 Dynamic model class loading (#101) 2024-01-25 15:29:07 -08:00
shiyi.c_98
0147f940dd fix batch error for llava-hd (#98) 2024-01-25 07:56:25 -08:00
parasol-aser
23950056f0 support speculative execution for openai API (#48)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-01-25 01:57:06 -08:00
isaac-vidas
0c457bae8f Handle grayscale images in expand2square (#97) 2024-01-24 16:23:11 -08:00
Haotian Liu
d3fc86a43e Improve Chinese character streaming when the last char is half Chinese word. (#95) 2024-01-24 12:23:27 -08:00
Liangsheng Yin
01ee0fbc05 fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
2024-01-25 01:16:25 +08:00
Lianmin Zheng
6dceab4d17 bump version to 0.1.9 2024-01-24 11:37:25 +00:00
Lianmin Zheng
c70b3cfa9e Bump the version to v0.1.8 (#93) 2024-01-24 03:33:34 -08:00
Ying Sheng
489796c7ea minor performance fix 2024-01-24 10:45:44 +00:00
Lianmin Zheng
fa7a696d04 Fix max_new_tokens for limited memory 2024-01-24 10:44:32 +00:00
Lianmin Zheng
bef0b35902 Fix llava & Fix multiprocessing 2024-01-24 10:35:31 +00:00
shiyi.c_98
c6576e820c Llava-hd Support (#92)
Co-authored-by: Haotian Liu <liuhaotian.cn@gmail.com>
2024-01-24 01:51:21 -08:00
Lianmin Zheng
99258181c6 set start method to spawn 2024-01-24 08:55:38 +00:00
isaac-vidas
3de54a1b55 Add health endpoint to SGLang runtime server (#90) 2024-01-23 19:00:28 -08:00
Lianmin Zheng
7358fa64f7 Fix a bug in runtime backend 2024-01-23 22:10:17 +00:00
Lianmin Zheng
9a16fea012 Return logprob for choices (#87) 2024-01-23 05:07:30 -08:00
Lianmin Zheng
959c4174b2 Fix the chat template for QWen (#83) 2024-01-22 21:46:47 -08:00
Lianmin Zheng
94e05770db Fix after QWen support (#82) 2024-01-22 21:17:05 -08:00
Arcmoon
63e97e5e4c Suppport qwen model and solve some problems (#75) 2024-01-22 20:14:51 -08:00
isaac-vidas
e08bca2840 Support load fine-tuned LLaVA model (#80) 2024-01-22 18:15:48 -08:00
Ying Sheng
3f5c2f4c4a Add an async example (#37) 2024-01-21 15:17:30 -08:00
Lianmin Zheng
007eeb4eb9 Fix the error message and dependency of openai backend (#71) 2024-01-21 14:56:25 -08:00
Lianmin Zheng
723f042163 release v0.1.7 & fix bugs 2024-01-21 10:31:02 +00:00
Lianmin Zheng
585eababa1 Improve error message of openai 2024-01-21 10:13:45 +00:00
Lianmin Zheng
cc3ada983f Bump version to 0.1.6 (#68) 2024-01-21 01:45:02 -08:00
Lianmin Zheng
a837166e6f Fix select and normalized logprobs (#67) 2024-01-21 01:39:23 -08:00
Lianmin Zheng
11f3cca64f Fix select (#64) 2024-01-20 23:20:35 -08:00
Liangsheng Yin
ca13f3b8c5 Disk FSM cache and adjust code. (#63) 2024-01-20 21:26:11 -08:00
Lianmin Zheng
f30abd090a Improve error message & Add vicuna template (#57) 2024-01-19 17:03:33 -08:00
Liangsheng Yin
40ab1f0129 Fix the possible bug of decode out of memory (#36) 2024-01-19 11:01:15 -08:00