Commit Graph

41 Commits

Author SHA1 Message Date
Lianmin Zheng
c17c578108 Simplify tokenizer manager (#1904) 2024-11-03 08:38:26 -08:00
Gleb Drozdov
a95d5589c3 Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684) 2024-10-17 18:06:52 +00:00
Michael Feil
b0facb3316 add orjson for jsonresponse (#1688) 2024-10-16 18:14:30 -07:00
havetc
ecb8bad276 Returning a per request metric for number of cached_tokens read (#1599) 2024-10-16 11:49:22 -07:00
Ying Sheng
2725f8da61 [Minor] Rename no_eos_trim to no_stop_trim (#1661) 2024-10-13 20:30:03 -07:00
Ying Sheng
4876117171 [Fix] fix eos trim inconsistency (#1650) 2024-10-13 01:07:09 -07:00
Lianmin Zheng
e37cdab0c6 Fix ignore_eos (#1645) 2024-10-12 00:36:28 -07:00
LI MOU
1d9deeacdb fix missing ignore_eos in v1/chat/completions (#1642) 2024-10-11 21:37:20 -07:00
Lianmin Zheng
f13d86f920 Add image_token in conversation.py (#1632)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2024-10-11 05:07:51 -07:00
Yiding-Lu
b503881bd2 [Bug] Fix the Image Input of Batch Generation (#1579) 2024-10-11 02:25:04 -07:00
Ying Sheng
e4780cf839 [API, Feature] Support response prefill for openai API (#1490) 2024-09-22 06:46:17 -07:00
Lianmin Zheng
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) 2024-09-15 06:36:06 -07:00
Lianmin Zheng
b912de11b0 Make stop reason a dict instead of str (#1407) 2024-09-12 20:47:31 -07:00
zifeitong
9144ed1067 Support OpenAI API json_schema response format (#1363) 2024-09-09 19:08:25 -07:00
Kaichen Zhang - NTU
662ecd9368 [Feat] Add modalities for vision server when handling pixel values for llava (#1346) 2024-09-09 02:07:34 -07:00
Christopher Chou
51c554d812 Allow more flexible assistant and system response (#1256) 2024-08-30 11:51:44 -07:00
caiyueliang
2f1d92834f [FEAT] Support batches cancel (#1222)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-26 23:28:26 +00:00
havetc
9935f97b3e [FEAT] JSON constrained support (#1125)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-26 09:37:26 -07:00
Lianmin Zheng
902278008a [Minor] Improve the function organization in TokenizerManager & improve loggers (#1208) 2024-08-25 14:46:34 -07:00
Lianmin Zheng
a8ae640328 Improve docs and warnings (#1164) 2024-08-20 08:31:29 -07:00
Juwan Yoo
d8476818ef feat: allow streaming for multi-prompt and/or parallel sampling (#1134) 2024-08-20 08:06:55 -07:00
yichuan~
b997a18d74 [Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-08-18 23:45:41 -07:00
Ying Sheng
6767e2229f Support jinja as chat template file (#1104) 2024-08-14 17:43:14 -07:00
Lianmin Zheng
d84c5e70f7 Test the case when max_new_tokens is very large (#1038) 2024-08-11 16:41:03 -07:00
Ying Sheng
b16e856f11 Add openai embedding API (#997) 2024-08-09 11:19:18 -07:00
Juwan Yoo
ab7875941b feat: frequency, min_new_tokens, presence, and repetition penalties (#973) 2024-08-08 04:21:08 -07:00
yichuan~
3a79613c28 support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-08-08 09:41:57 +00:00
Ying Sheng
20a4f927dc Add io struct for embedding models [unreachable code] - step 2/3 (#987) 2024-08-08 07:52:31 +00:00
yichuan~
795eab6dda Add support for Batch API test (#936) 2024-08-06 23:52:10 -07:00
yichuan~
fd7926e46e Fix prompt len in parallel sampling (#928) 2024-08-05 00:56:08 -07:00
yichuan~
d53dcf9c98 Support more OpenAI API test (#916) 2024-08-04 16:43:09 -07:00
Ying Sheng
fbd6b94d69 Fix the double BOS problem in the HF chat template (#888) 2024-08-02 00:30:50 -07:00
yichuan~
ca600e8cd6 Add support for logprobs in OpenAI chat API (#852) 2024-08-01 00:08:21 -07:00
yichuan~
bb0501c0d9 Fix List input bug (#838) 2024-07-30 13:40:51 -07:00
yichuan~
084fa54d37 Add support for OpenAI API : offline batch(file) processing (#699)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-07-29 13:07:18 -07:00
Ying Sheng
8d908a937c Fix echo + lobprob for OpenAI API when the prompt is a list (#791) 2024-07-28 17:09:16 -07:00
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Lianmin Zheng
30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) 2024-07-27 19:50:34 -07:00
Lianmin Zheng
f95e661757 Fix max_tokens for OpenAI chat completion API (#766) 2024-07-27 15:44:27 -07:00
Toshiki Kataoka
40facad5f1 feat: support token ids in /v1/completions (#736) 2024-07-26 02:53:17 -07:00
Mingyi
e3046ea3a8 Update OpenAI API (#667) 2024-07-19 23:20:54 -07:00