Commit Graph

75 Commits

Author SHA1 Message Date
Lianmin Zheng
bf72b80122 [Auto Sync] Update io_struct.py (20250909) (#10236)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
2025-09-09 14:15:21 -07:00
Liangsheng Yin
e719bb0e84 [1/2] Refactor multi-tokenizer manager (#10074) 2025-09-07 19:13:34 +08:00
Shangming Cai
a25e8e42eb Move multi-tokenizer event loop to better place (#9902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-01 23:12:21 -07:00
ybyang
5f77e1292d Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-01 01:00:13 -07:00
Jonas
a0a77d937b Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: minleminzui <2969413251@qq.com>
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-25 15:26:26 -07:00
Chanh Nguyen
127d4b0d5e Support GC Freezing to improve latency & throughput (#9241)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2025-08-23 13:43:09 +08:00
Xinyuan Tong
6c855db82c Revert "bugfix: Fix output_ids extraction in detokenizer_manager" (#9467) 2025-08-21 17:24:25 -07:00
Chang Su
a6452b7188 bugfix: Fix output_ids extraction in detokenizer_manager (#9047) 2025-08-11 03:17:32 -07:00
Lianmin Zheng
a947154286 Revert "Support Multi Process Tokenizer Manager" (#8960) 2025-08-08 02:28:27 -07:00
ybyang
7490e3f67d Support Multi Process Tokenizer Manager (#6555)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: lw9527 <952799980@qq.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
2025-08-08 01:45:50 -07:00
Chang Su
92cc32d9fc Support v1/responses and use harmony in serving_chat (#8837)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-06 16:20:34 -07:00
Lianmin Zheng
d18c6b3358 Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 14:33:38 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
fzyzcjy
9087694006 Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117) 2025-02-21 19:50:46 -08:00
Shenggui Li
9af0e21ef5 [bug] fixed batch api for DeepSeek V3/R1 (#3754) 2025-02-21 10:28:16 -08:00
Jackmin801
5f0e7de339 [Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-10 15:54:37 -08:00
Lianmin Zheng
1dda8c5e4c Return more infos for computing average acceptance length (#3152) 2025-01-26 04:51:54 -08:00
Seungduk Kim
d77caa2b75 [#2812] Make the decode status dict capcity adjustable by a CLI param (#2839) 2025-01-19 11:36:53 -08:00
giorgiopiatti-dfinity
8b6a4486ec fix missing revision arg when loading tokenizer (#2982) 2025-01-19 11:36:07 -08:00
Lianmin Zheng
0427416b59 Fix zmq binding (#2930)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
2025-01-16 14:36:07 -08:00
Lianmin Zheng
f65c13b559 Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902) 2025-01-15 04:54:14 -08:00
Shi Shuai
c4f9707e16 Improve: Token-In Token-Out Usage for RLHF (#2843) 2025-01-11 15:14:26 -08:00
Shi Shuai
35bdb48557 [Feature] Get Token IDs with Engine.generate() (#2636)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-29 12:28:27 -08:00
Lianmin Zheng
0ce091a82d [Minor] Improve code style (#2419) 2024-12-09 03:05:59 -08:00
Lianmin Zheng
a6ca736c8e Simplify stream_output (#2398) 2024-12-08 12:27:13 -08:00
SangBin Cho
1f09e84b9a nit: Remove busy waiting on scheduler (#2382) 2024-12-08 01:06:15 -08:00
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
Lianmin Zheng
c754652fcd Fix flasky tests (#2212) 2024-11-26 23:06:20 -08:00
Ying Sheng
e1e595d702 [feat] Refactor session control interface and add CI (#2173) 2024-11-25 12:32:51 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Ying Sheng
5942dfc00a [feat] Add session control (#2073) 2024-11-20 00:36:53 -08:00
Lianmin Zheng
2558d6a675 Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042) 2024-11-15 05:02:44 -08:00
Liangsheng Yin
1e8903414a Fix possible ZMQ hanging (#1800) 2024-10-25 23:07:07 -07:00
Lianmin Zheng
fb99aaa527 [Fix] Fix --skip-tokenizer-init (#1798) 2024-10-25 18:51:59 -07:00
Ying Sheng
2fce449b1c [API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-10-23 07:02:29 +00:00
Ying Sheng
2725f8da61 [Minor] Rename no_eos_trim to no_stop_trim (#1661) 2024-10-13 20:30:03 -07:00
Ying Sheng
4876117171 [Fix] fix eos trim inconsistency (#1650) 2024-10-13 01:07:09 -07:00
Lianmin Zheng
114bbc8651 Use ipc instead of tcp in zmq (#1566) 2024-10-04 00:45:52 -07:00
Lianmin Zheng
048685430d Improve process creation (#1534) 2024-09-29 02:36:12 -07:00
Lianmin Zheng
e165a9fc1b Make detokenizer_manager.py not asyncio (#1532) 2024-09-28 19:33:09 -07:00
Lianmin Zheng
902278008a [Minor] Improve the function organization in TokenizerManager & improve loggers (#1208) 2024-08-25 14:46:34 -07:00
Lianmin Zheng
5623826f73 [Minor] Improve logging and rename the health check endpoint name (#1180) 2024-08-21 19:24:36 -07:00
Shan Yu
cd10654e7e [Feat] Support update weights without restart server (#1157) 2024-08-20 13:48:24 -07:00
Lianmin Zheng
cdc8d60752 Improve the code style: more comments and remove useless packages (#1139) 2024-08-17 14:37:52 -07:00
Lianmin Zheng
d84c5e70f7 Test the case when max_new_tokens is very large (#1038) 2024-08-11 16:41:03 -07:00
gryffindor-rr
9cf0a5bada Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
2024-08-09 12:14:13 -07:00
Ying Sheng
e040a2450b Add e5-mistral embedding model - step 3/3 (#988) 2024-08-08 16:31:19 -07:00