Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
fzyzcjy
|
9087694006
|
Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117)
|
2025-02-21 19:50:46 -08:00 |
|
Shenggui Li
|
9af0e21ef5
|
[bug] fixed batch api for DeepSeek V3/R1 (#3754)
|
2025-02-21 10:28:16 -08:00 |
|
Jackmin801
|
5f0e7de339
|
[Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-10 15:54:37 -08:00 |
|
Lianmin Zheng
|
1dda8c5e4c
|
Return more infos for computing average acceptance length (#3152)
|
2025-01-26 04:51:54 -08:00 |
|
Seungduk Kim
|
d77caa2b75
|
[#2812] Make the decode status dict capcity adjustable by a CLI param (#2839)
|
2025-01-19 11:36:53 -08:00 |
|
giorgiopiatti-dfinity
|
8b6a4486ec
|
fix missing revision arg when loading tokenizer (#2982)
|
2025-01-19 11:36:07 -08:00 |
|
Lianmin Zheng
|
0427416b59
|
Fix zmq binding (#2930)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
|
2025-01-16 14:36:07 -08:00 |
|
Lianmin Zheng
|
f65c13b559
|
Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902)
|
2025-01-15 04:54:14 -08:00 |
|
Shi Shuai
|
c4f9707e16
|
Improve: Token-In Token-Out Usage for RLHF (#2843)
|
2025-01-11 15:14:26 -08:00 |
|
Shi Shuai
|
35bdb48557
|
[Feature] Get Token IDs with Engine.generate() (#2636)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-29 12:28:27 -08:00 |
|
Lianmin Zheng
|
0ce091a82d
|
[Minor] Improve code style (#2419)
|
2024-12-09 03:05:59 -08:00 |
|
Lianmin Zheng
|
a6ca736c8e
|
Simplify stream_output (#2398)
|
2024-12-08 12:27:13 -08:00 |
|
SangBin Cho
|
1f09e84b9a
|
nit: Remove busy waiting on scheduler (#2382)
|
2024-12-08 01:06:15 -08:00 |
|
Lianmin Zheng
|
d4fc1a70e3
|
Crash the server correctly during error (#2231)
|
2024-11-28 00:22:39 -08:00 |
|
Lianmin Zheng
|
c754652fcd
|
Fix flasky tests (#2212)
|
2024-11-26 23:06:20 -08:00 |
|
Ying Sheng
|
e1e595d702
|
[feat] Refactor session control interface and add CI (#2173)
|
2024-11-25 12:32:51 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Ying Sheng
|
5942dfc00a
|
[feat] Add session control (#2073)
|
2024-11-20 00:36:53 -08:00 |
|
Lianmin Zheng
|
2558d6a675
|
Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042)
|
2024-11-15 05:02:44 -08:00 |
|
Liangsheng Yin
|
1e8903414a
|
Fix possible ZMQ hanging (#1800)
|
2024-10-25 23:07:07 -07:00 |
|
Lianmin Zheng
|
fb99aaa527
|
[Fix] Fix --skip-tokenizer-init (#1798)
|
2024-10-25 18:51:59 -07:00 |
|
Ying Sheng
|
2fce449b1c
|
[API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-10-23 07:02:29 +00:00 |
|
Ying Sheng
|
2725f8da61
|
[Minor] Rename no_eos_trim to no_stop_trim (#1661)
|
2024-10-13 20:30:03 -07:00 |
|
Ying Sheng
|
4876117171
|
[Fix] fix eos trim inconsistency (#1650)
|
2024-10-13 01:07:09 -07:00 |
|
Lianmin Zheng
|
114bbc8651
|
Use ipc instead of tcp in zmq (#1566)
|
2024-10-04 00:45:52 -07:00 |
|
Lianmin Zheng
|
048685430d
|
Improve process creation (#1534)
|
2024-09-29 02:36:12 -07:00 |
|
Lianmin Zheng
|
e165a9fc1b
|
Make detokenizer_manager.py not asyncio (#1532)
|
2024-09-28 19:33:09 -07:00 |
|
Lianmin Zheng
|
902278008a
|
[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208)
|
2024-08-25 14:46:34 -07:00 |
|
Lianmin Zheng
|
5623826f73
|
[Minor] Improve logging and rename the health check endpoint name (#1180)
|
2024-08-21 19:24:36 -07:00 |
|
Shan Yu
|
cd10654e7e
|
[Feat] Support update weights without restart server (#1157)
|
2024-08-20 13:48:24 -07:00 |
|
Lianmin Zheng
|
cdc8d60752
|
Improve the code style: more comments and remove useless packages (#1139)
|
2024-08-17 14:37:52 -07:00 |
|
Lianmin Zheng
|
d84c5e70f7
|
Test the case when max_new_tokens is very large (#1038)
|
2024-08-11 16:41:03 -07:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
e040a2450b
|
Add e5-mistral embedding model - step 3/3 (#988)
|
2024-08-08 16:31:19 -07:00 |
|
Liangsheng Yin
|
cdcbde5fc3
|
Code structure refactor (#807)
|
2024-07-29 23:04:48 -07:00 |
|
Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Liangsheng Yin
|
7620cd37dd
|
Fix jump forward when streaming (#665)
|
2024-07-19 16:42:06 -07:00 |
|
Liangsheng Yin
|
a9ef49c12c
|
Detokenize incrementally when streaming (#653)
|
2024-07-18 17:57:40 -07:00 |
|
Liangsheng Yin
|
0877f1e75b
|
Fix streaming (#600)
|
2024-07-07 01:55:58 -07:00 |
|
Pan Lyu
|
26908d9568
|
* fix(detokenizer_manager.py): fix truncated decoded output (#586)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-07-06 14:53:22 -07:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
Liangsheng Yin
|
9c902b1954
|
Decode Incrementally (#517)
|
2024-06-11 23:39:12 -07:00 |
|
Lianmin Zheng
|
f6dbd24043
|
Improve doc strings (#518)
|
2024-06-08 02:39:32 -07:00 |
|
Lianmin Zheng
|
91f93f141f
|
Crash the server when error or OOM happens (#514)
|
2024-06-07 19:22:34 -07:00 |
|
Qubitium
|
f70f72586a
|
Fix rid state map leak + Refractor .finished (#505)
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 13:20:40 -07:00 |
|
Liangsheng Yin
|
f06e90c2cf
|
Optimize retract (#440)
|
2024-05-26 00:07:26 +08:00 |
|
Lianmin Zheng
|
2cea6146d8
|
Improve logging & add logit cap (#471)
|
2024-05-24 03:48:53 -07:00 |
|