Commit Graph

103 Commits

Author SHA1 Message Date
Cody Yu
4cb9aaedf3 Fix logprobs with logprob_start_len (#193) 2024-02-22 10:33:03 -08:00
psych0v0yager
9de9a46815 Added the ability to Modify the Context Length (#210) 2024-02-20 16:22:56 -08:00
Liangsheng Yin
91e036334f Adjust outlines version. (#200)
Co-authored-by: comaniac <hao.yu.cody@gmail.com>
2024-02-17 13:40:39 +08:00
Cody Yu
2a74748b2f Pin outlines version (#196) 2024-02-16 13:01:40 -08:00
Cody Yu
63ba630bbb Refactor decoding logprob and add completion_tokens_wo_jump_forward (#189) 2024-02-15 10:54:20 -08:00
Lianmin Zheng
6493256b7d improve print 2024-02-12 12:43:48 +00:00
Lianmin Zheng
06008bc295 Fix server launch for jupyter notebook (#186) 2024-02-12 04:43:14 -08:00
Lianmin Zheng
bb824da41a Add Together and AzureOpenAI examples (#184) 2024-02-12 01:06:38 -08:00
Yaya Sy
931213245c correct reference dtype openai.py (#181) 2024-02-11 13:26:20 -08:00
Lianmin Zheng
624b21e742 Update version to 0.1.12 (#178) 2024-02-11 06:43:45 -08:00
Lianmin Zheng
c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) 2024-02-11 05:50:13 -08:00
Cody Yu
50afed4eaa Support extra field regex in OpenAI API (#172) 2024-02-10 17:21:33 -08:00
Cody Yu
4d303c4fa3 Fix token usage with jump forward (#174) 2024-02-09 20:06:15 -08:00
Liangsheng Yin
37b42297f8 import outlines (#168) 2024-02-09 10:13:02 +08:00
Cody Yu
cba5027332 Fix BaseCache metric (#170) 2024-02-08 17:23:09 -08:00
Ying Sheng
a6aa46dd3f minor 2024-02-08 04:35:25 +00:00
Srinivas Billa
405f26b00b Add Auth Token to RuntimeEndpoint (#162) 2024-02-07 20:07:31 -08:00
Liangsheng Yin
b1a3a454ee add --disable-disk-cache (#160)
Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>
2024-02-08 00:50:12 +08:00
Cody Yu
26c3494152 [Submodule] Change FlashInfer to import (#156) 2024-02-06 19:28:29 -08:00
Lianmin Zheng
23f05005fd Format code & move functions (#155) 2024-02-06 13:27:46 -08:00
Cody Yu
a7334aeea1 Support decode token logprobs (#130) 2024-02-06 12:24:55 -08:00
Arcmoon
3ae78a09b3 Add gptq quantization model support (#141) 2024-02-06 11:35:04 -08:00
Cody Yu
ccbe1e67d8 Temporary fix OpenAI API for Pydantic v1/v2 (#153) 2024-02-06 11:34:15 -08:00
LiviaSun
e2bf732bc3 add openai error handler with retry and logger (#148) 2024-02-05 20:38:41 -08:00
Cody Yu
322421fae3 Add warmup to SRT server (#146) 2024-02-05 14:21:16 -08:00
Liangsheng Yin
26f0bedc8f jump-forward rename (#144) 2024-02-05 16:50:37 +08:00
Yaya Sy
82fa69b3cc fix undfined variable (#142) 2024-02-04 14:27:52 -08:00
Liangsheng Yin
bb3a3b6675 Support Faster JSON decoding for llava (#137)
When sending fast-forwarded reqs to model_rpc, re-calculate `pad_input_ids`
2024-02-03 23:32:05 +08:00
Ying Sheng
45d6592d40 Fix no-cache mode (#136) 2024-02-03 04:59:06 -08:00
Ying Sheng
f6bfe3aaff Release 0.1.11 (#134) 2024-02-03 02:50:13 -08:00
Ying Sheng
e095b16236 Add max_prefill_num_token into server arguments (#133) 2024-02-03 02:35:54 -08:00
Ying Sheng
67be11c790 fix bug of race condition in copy() 2024-02-03 01:38:00 -08:00
Liangsheng Yin
cd8c3ccd95 Fix is_multimodal_model judge (#132) 2024-02-03 11:48:01 +08:00
Christopher Chou
864425300f Yi-VL Model (#112) 2024-02-01 08:33:22 -08:00
Lianmin Zheng
c7af9f7393 Fix a bug in llava-hd 2024-01-31 18:52:15 +00:00
Lianmin Zheng
ad82bac6f5 Fix model loading & format code (#125) 2024-01-30 23:49:52 -08:00
Cody Yu
71b54eea7d Add cache metrics (#119) 2024-01-30 22:13:14 -08:00
Lianmin Zheng
74b3bfaaf8 format code 2024-01-30 16:36:10 +00:00
Jay Zhou
4a634cf646 [Feature] Allow specifying all ports to use in advance (#116) 2024-01-30 08:34:51 -08:00
Lianmin Zheng
a49dc52bfa release v0.1.10 2024-01-30 15:37:52 +00:00
Lianmin Zheng
873d0e8537 Ignore detokenization error 2024-01-30 14:52:06 +00:00
Keith Stevens
1d0fbe8e43 [Feature] Adds basic support for image content in OpenAI chat routes (#113) 2024-01-30 06:12:33 -08:00
Lianmin Zheng
97aa9b3284 Improve docs & Add JSON decode example (#121) 2024-01-30 05:45:27 -08:00
Lianmin Zheng
0617528632 Update quick start examples (#120) 2024-01-30 04:29:32 -08:00
Lianmin Zheng
4ea92f8307 Format code (#118) 2024-01-29 17:08:12 -08:00
Junyang Lin
6b0af2853c Add qwen2 (#114) 2024-01-29 17:06:02 -08:00
Lianmin Zheng
6f560c761b Improve the control of streaming and improve the first token latency in streaming (#117) 2024-01-29 17:05:42 -08:00
Cody Yu
cd6872334e Fix Mistral model loading (#108)
Co-authored-by: johndun <dunavent.jm@gmail.com>
2024-01-26 09:38:43 -08:00
Liangsheng Yin
81561f8e2d Flush Cache API (#103) 2024-01-25 21:32:59 -08:00
Cody Yu
3a581e9949 Dynamic model class loading (#101) 2024-01-25 15:29:07 -08:00