Commit Graph

137 Commits

Author SHA1 Message Date
Enrique Shockwave
cf9d8efdd3 llama3 instruct template (#372) 2024-04-21 09:40:12 -07:00
Liangsheng Yin
1bf1cf1953 Reduce overhead when fork(1) (#375) 2024-04-21 17:25:14 +08:00
SimoneRaponi
ff99c38a07 Add timeout to get_meta_info (#346)
Co-authored-by: simone <simone.raponi@equixely.com>
2024-04-03 22:22:06 +08:00
Junlong Li
cb389c91bc Fix llava parallelism/fork bug (#315) 2024-03-28 19:24:54 -07:00
Liangsheng Yin
2af565b3bb [model] DBRX-instruct support (#337) 2024-03-28 10:05:19 -07:00
Liangsheng Yin
3842eba5fa Logprobs Refractor (#331) 2024-03-28 14:34:49 +08:00
Jani Monoses
e57f079275 Use Anthropic messages API (#304) 2024-03-22 13:23:31 -07:00
Liangsheng Yin
89885b31ef Gemma Support (#256) 2024-03-11 12:14:27 +08:00
Lin Tianchuan
30d67b2bca Add set_var to interpreter.py (#263) 2024-03-07 23:20:11 +08:00
Xinwei Xiong
b0b722ee8e Refactor ChatTemplate for Enhanced Clarity and Efficiency (#201) 2024-03-03 17:52:36 +08:00
Enrique Shockwave
9759d927cf fix chatml template (#195) 2024-02-24 16:34:22 +08:00
Zhang Wenbin
8d0a7fae3b Fix interpreter.py get_var(var_name) in text iter when stream is not enabled (#198) 2024-02-24 16:27:34 +08:00
Liangsheng Yin
c4e9ebe3a4 Fix stop str merging (#225)
Co-authored-by: Enrique Shockwave <33002121+qeternity@users.noreply.github.com>
2024-02-24 16:05:21 +08:00
Lianmin Zheng
c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) 2024-02-11 05:50:13 -08:00
Lianmin Zheng
23f05005fd Format code & move functions (#155) 2024-02-06 13:27:46 -08:00
Ying Sheng
67be11c790 fix bug of race condition in copy() 2024-02-03 01:38:00 -08:00
Christopher Chou
864425300f Yi-VL Model (#112) 2024-02-01 08:33:22 -08:00
Lianmin Zheng
0617528632 Update quick start examples (#120) 2024-01-30 04:29:32 -08:00
parasol-aser
23950056f0 support speculative execution for openai API (#48)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-01-25 01:57:06 -08:00
Liangsheng Yin
01ee0fbc05 fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
2024-01-25 01:16:25 +08:00
Lianmin Zheng
7358fa64f7 Fix a bug in runtime backend 2024-01-23 22:10:17 +00:00
Lianmin Zheng
9a16fea012 Return logprob for choices (#87) 2024-01-23 05:07:30 -08:00
Lianmin Zheng
959c4174b2 Fix the chat template for QWen (#83) 2024-01-22 21:46:47 -08:00
Lianmin Zheng
94e05770db Fix after QWen support (#82) 2024-01-22 21:17:05 -08:00
Lianmin Zheng
007eeb4eb9 Fix the error message and dependency of openai backend (#71) 2024-01-21 14:56:25 -08:00
Lianmin Zheng
723f042163 release v0.1.7 & fix bugs 2024-01-21 10:31:02 +00:00
Liangsheng Yin
40ab1f0129 Fix the possible bug of decode out of memory (#36) 2024-01-19 11:01:15 -08:00
Lianmin Zheng
2b079f8931 Increase interpreter parallelism (#46) 2024-01-18 13:30:10 -08:00
Lianmin Zheng
22ec7bc2a1 Expose more arguments to control the scheduling policy (#32) 2024-01-17 18:37:02 -08:00
Christopher Chou
c0454b323c Add option to return metadata in async streaming (#18) 2024-01-17 18:15:02 -08:00
Lianmin Zheng
bf51ddc6e5 Improve docs & Rename Gemini -> VertexAI (#19) 2024-01-17 02:54:41 -08:00
shiyi.c_98
fd7c479239 Gemini Backend (#9)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-01-16 22:29:37 -08:00
Lianmin Zheng
c4707f1bb5 Improve docs (#17) 2024-01-16 19:53:55 -08:00
Lianmin Zheng
4bd8233f2c Fix test cases (#6) 2024-01-15 01:15:53 -08:00
Liangsheng Yin
08ab2a1655 Json Decode && Mutl-Turns (#4) 2024-01-15 00:49:29 -08:00
Liangsheng Yin
331848de9d Add SRT json decode example (#2) 2024-01-09 12:35:44 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00