Commit Graph

32 Commits

Author SHA1 Message Date
Lianmin Zheng
902278008a [Minor] Improve the function organization in TokenizerManager & improve loggers (#1208) 2024-08-25 14:46:34 -07:00
Lianmin Zheng
5623826f73 [Minor] Improve logging and rename the health check endpoint name (#1180) 2024-08-21 19:24:36 -07:00
Shan Yu
cd10654e7e [Feat] Support update weights without restart server (#1157) 2024-08-20 13:48:24 -07:00
Lianmin Zheng
cdc8d60752 Improve the code style: more comments and remove useless packages (#1139) 2024-08-17 14:37:52 -07:00
Lianmin Zheng
d84c5e70f7 Test the case when max_new_tokens is very large (#1038) 2024-08-11 16:41:03 -07:00
gryffindor-rr
9cf0a5bada Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
2024-08-09 12:14:13 -07:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00
Liangsheng Yin
cdcbde5fc3 Code structure refactor (#807) 2024-07-29 23:04:48 -07:00
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Liangsheng Yin
7620cd37dd Fix jump forward when streaming (#665) 2024-07-19 16:42:06 -07:00
Liangsheng Yin
a9ef49c12c Detokenize incrementally when streaming (#653) 2024-07-18 17:57:40 -07:00
Liangsheng Yin
0877f1e75b Fix streaming (#600) 2024-07-07 01:55:58 -07:00
Pan Lyu
26908d9568 * fix(detokenizer_manager.py): fix truncated decoded output (#586)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
2024-07-06 14:53:22 -07:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
Liangsheng Yin
9c902b1954 Decode Incrementally (#517) 2024-06-11 23:39:12 -07:00
Lianmin Zheng
f6dbd24043 Improve doc strings (#518) 2024-06-08 02:39:32 -07:00
Lianmin Zheng
91f93f141f Crash the server when error or OOM happens (#514) 2024-06-07 19:22:34 -07:00
Qubitium
f70f72586a Fix rid state map leak + Refractor .finished (#505)
Co-authored-by: ZX <zx@lbx.dev>
2024-06-07 13:20:40 -07:00
Liangsheng Yin
f06e90c2cf Optimize retract (#440) 2024-05-26 00:07:26 +08:00
Lianmin Zheng
2cea6146d8 Improve logging & add logit cap (#471) 2024-05-24 03:48:53 -07:00
Lianmin Zheng
c05956e534 Simplify port allocation (#447) 2024-05-16 18:07:30 -07:00
Liangsheng Yin
d5de20a3ee Fix sync() when fork(1) (#412) 2024-05-08 15:15:18 +08:00
ZhouXingg
183df47282 SamplingParams add "spaces_between_special_tokens" argument (#392) 2024-04-30 16:17:12 -07:00
Liangsheng Yin
150d7020ed Revert removing the unused imports (#385) 2024-04-23 22:36:33 +08:00
Liangsheng Yin
9acc6e3504 add .isort.cfg (#378) 2024-04-22 22:38:09 +08:00
Liangsheng Yin
26f0bedc8f jump-forward rename (#144) 2024-02-05 16:50:37 +08:00
Lianmin Zheng
873d0e8537 Ignore detokenization error 2024-01-30 14:52:06 +00:00
parasol-aser
23950056f0 support speculative execution for openai API (#48)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-01-25 01:57:06 -08:00
Liangsheng Yin
01ee0fbc05 fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
2024-01-25 01:16:25 +08:00
Lianmin Zheng
94e05770db Fix after QWen support (#82) 2024-01-22 21:17:05 -08:00
Arcmoon
63e97e5e4c Suppport qwen model and solve some problems (#75) 2024-01-22 20:14:51 -08:00
Lianmin Zheng
22085081bb release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-01-08 04:37:50 +00:00