Enrique Shockwave
|
9759d927cf
|
fix chatml template (#195)
|
2024-02-24 16:34:22 +08:00 |
|
Zhang Wenbin
|
8d0a7fae3b
|
Fix interpreter.py get_var(var_name) in text iter when stream is not enabled (#198)
|
2024-02-24 16:27:34 +08:00 |
|
Liangsheng Yin
|
c4e9ebe3a4
|
Fix stop str merging (#225)
Co-authored-by: Enrique Shockwave <33002121+qeternity@users.noreply.github.com>
|
2024-02-24 16:05:21 +08:00 |
|
Cody Yu
|
3c2c5869ad
|
Support outlines > 0.0.31 (#219)
|
2024-02-24 15:06:17 +08:00 |
|
Cody Yu
|
4cb9aaedf3
|
Fix logprobs with logprob_start_len (#193)
|
2024-02-22 10:33:03 -08:00 |
|
psych0v0yager
|
9de9a46815
|
Added the ability to Modify the Context Length (#210)
|
2024-02-20 16:22:56 -08:00 |
|
Liangsheng Yin
|
91e036334f
|
Adjust outlines version. (#200)
Co-authored-by: comaniac <hao.yu.cody@gmail.com>
|
2024-02-17 13:40:39 +08:00 |
|
Cody Yu
|
2a74748b2f
|
Pin outlines version (#196)
|
2024-02-16 13:01:40 -08:00 |
|
Cody Yu
|
63ba630bbb
|
Refactor decoding logprob and add completion_tokens_wo_jump_forward (#189)
|
2024-02-15 10:54:20 -08:00 |
|
Lianmin Zheng
|
6493256b7d
|
improve print
|
2024-02-12 12:43:48 +00:00 |
|
Lianmin Zheng
|
06008bc295
|
Fix server launch for jupyter notebook (#186)
|
2024-02-12 04:43:14 -08:00 |
|
Lianmin Zheng
|
bb824da41a
|
Add Together and AzureOpenAI examples (#184)
|
2024-02-12 01:06:38 -08:00 |
|
Yaya Sy
|
931213245c
|
correct reference dtype openai.py (#181)
|
2024-02-11 13:26:20 -08:00 |
|
Lianmin Zheng
|
624b21e742
|
Update version to 0.1.12 (#178)
|
2024-02-11 06:43:45 -08:00 |
|
Lianmin Zheng
|
c51020cf0c
|
Fix the chat template for llava-v1.6-34b & format code (#177)
|
2024-02-11 05:50:13 -08:00 |
|
Cody Yu
|
50afed4eaa
|
Support extra field regex in OpenAI API (#172)
|
2024-02-10 17:21:33 -08:00 |
|
Cody Yu
|
4d303c4fa3
|
Fix token usage with jump forward (#174)
|
2024-02-09 20:06:15 -08:00 |
|
Liangsheng Yin
|
37b42297f8
|
import outlines (#168)
|
2024-02-09 10:13:02 +08:00 |
|
Cody Yu
|
cba5027332
|
Fix BaseCache metric (#170)
|
2024-02-08 17:23:09 -08:00 |
|
Ying Sheng
|
a6aa46dd3f
|
minor
|
2024-02-08 04:35:25 +00:00 |
|
Srinivas Billa
|
405f26b00b
|
Add Auth Token to RuntimeEndpoint (#162)
|
2024-02-07 20:07:31 -08:00 |
|
Liangsheng Yin
|
b1a3a454ee
|
add --disable-disk-cache (#160)
Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>
|
2024-02-08 00:50:12 +08:00 |
|
Cody Yu
|
26c3494152
|
[Submodule] Change FlashInfer to import (#156)
|
2024-02-06 19:28:29 -08:00 |
|
Lianmin Zheng
|
23f05005fd
|
Format code & move functions (#155)
|
2024-02-06 13:27:46 -08:00 |
|
Cody Yu
|
a7334aeea1
|
Support decode token logprobs (#130)
|
2024-02-06 12:24:55 -08:00 |
|
Arcmoon
|
3ae78a09b3
|
Add gptq quantization model support (#141)
|
2024-02-06 11:35:04 -08:00 |
|
Cody Yu
|
ccbe1e67d8
|
Temporary fix OpenAI API for Pydantic v1/v2 (#153)
|
2024-02-06 11:34:15 -08:00 |
|
LiviaSun
|
e2bf732bc3
|
add openai error handler with retry and logger (#148)
|
2024-02-05 20:38:41 -08:00 |
|
Cody Yu
|
322421fae3
|
Add warmup to SRT server (#146)
|
2024-02-05 14:21:16 -08:00 |
|
Liangsheng Yin
|
26f0bedc8f
|
jump-forward rename (#144)
|
2024-02-05 16:50:37 +08:00 |
|
Yaya Sy
|
82fa69b3cc
|
fix undfined variable (#142)
|
2024-02-04 14:27:52 -08:00 |
|
Liangsheng Yin
|
bb3a3b6675
|
Support Faster JSON decoding for llava (#137)
When sending fast-forwarded reqs to model_rpc, re-calculate `pad_input_ids`
|
2024-02-03 23:32:05 +08:00 |
|
Ying Sheng
|
45d6592d40
|
Fix no-cache mode (#136)
|
2024-02-03 04:59:06 -08:00 |
|
Ying Sheng
|
f6bfe3aaff
|
Release 0.1.11 (#134)
|
2024-02-03 02:50:13 -08:00 |
|
Ying Sheng
|
e095b16236
|
Add max_prefill_num_token into server arguments (#133)
|
2024-02-03 02:35:54 -08:00 |
|
Ying Sheng
|
67be11c790
|
fix bug of race condition in copy()
|
2024-02-03 01:38:00 -08:00 |
|
Liangsheng Yin
|
cd8c3ccd95
|
Fix is_multimodal_model judge (#132)
|
2024-02-03 11:48:01 +08:00 |
|
Christopher Chou
|
864425300f
|
Yi-VL Model (#112)
|
2024-02-01 08:33:22 -08:00 |
|
Lianmin Zheng
|
c7af9f7393
|
Fix a bug in llava-hd
|
2024-01-31 18:52:15 +00:00 |
|
Lianmin Zheng
|
ad82bac6f5
|
Fix model loading & format code (#125)
|
2024-01-30 23:49:52 -08:00 |
|
Cody Yu
|
71b54eea7d
|
Add cache metrics (#119)
|
2024-01-30 22:13:14 -08:00 |
|
Lianmin Zheng
|
74b3bfaaf8
|
format code
|
2024-01-30 16:36:10 +00:00 |
|
Jay Zhou
|
4a634cf646
|
[Feature] Allow specifying all ports to use in advance (#116)
|
2024-01-30 08:34:51 -08:00 |
|
Lianmin Zheng
|
a49dc52bfa
|
release v0.1.10
|
2024-01-30 15:37:52 +00:00 |
|
Lianmin Zheng
|
873d0e8537
|
Ignore detokenization error
|
2024-01-30 14:52:06 +00:00 |
|
Keith Stevens
|
1d0fbe8e43
|
[Feature] Adds basic support for image content in OpenAI chat routes (#113)
|
2024-01-30 06:12:33 -08:00 |
|
Lianmin Zheng
|
97aa9b3284
|
Improve docs & Add JSON decode example (#121)
|
2024-01-30 05:45:27 -08:00 |
|
Lianmin Zheng
|
0617528632
|
Update quick start examples (#120)
|
2024-01-30 04:29:32 -08:00 |
|
Lianmin Zheng
|
4ea92f8307
|
Format code (#118)
|
2024-01-29 17:08:12 -08:00 |
|
Junyang Lin
|
6b0af2853c
|
Add qwen2 (#114)
|
2024-01-29 17:06:02 -08:00 |
|