Lianmin Zheng
|
bf72b80122
|
[Auto Sync] Update io_struct.py (20250909) (#10236)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
|
2025-09-09 14:15:21 -07:00 |
|
Liangsheng Yin
|
e719bb0e84
|
[1/2] Refactor multi-tokenizer manager (#10074)
|
2025-09-07 19:13:34 +08:00 |
|
Shangming Cai
|
a25e8e42eb
|
Move multi-tokenizer event loop to better place (#9902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-01 23:12:21 -07:00 |
|
ybyang
|
5f77e1292d
|
Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-01 01:00:13 -07:00 |
|
Jonas
|
a0a77d937b
|
Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: minleminzui <2969413251@qq.com>
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-25 15:26:26 -07:00 |
|
Chanh Nguyen
|
127d4b0d5e
|
Support GC Freezing to improve latency & throughput (#9241)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2025-08-23 13:43:09 +08:00 |
|
Xinyuan Tong
|
6c855db82c
|
Revert "bugfix: Fix output_ids extraction in detokenizer_manager" (#9467)
|
2025-08-21 17:24:25 -07:00 |
|
Chang Su
|
a6452b7188
|
bugfix: Fix output_ids extraction in detokenizer_manager (#9047)
|
2025-08-11 03:17:32 -07:00 |
|
Lianmin Zheng
|
a947154286
|
Revert "Support Multi Process Tokenizer Manager" (#8960)
|
2025-08-08 02:28:27 -07:00 |
|
ybyang
|
7490e3f67d
|
Support Multi Process Tokenizer Manager (#6555)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: lw9527 <952799980@qq.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
|
2025-08-08 01:45:50 -07:00 |
|
Chang Su
|
92cc32d9fc
|
Support v1/responses and use harmony in serving_chat (#8837)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-06 16:20:34 -07:00 |
|
Lianmin Zheng
|
d18c6b3358
|
Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 14:33:38 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
Lianmin Zheng
|
e074d84e5b
|
[Minor] more code cleanup (#4077)
|
2025-03-04 21:23:47 -08:00 |
|
Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
fzyzcjy
|
9087694006
|
Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117)
|
2025-02-21 19:50:46 -08:00 |
|
Shenggui Li
|
9af0e21ef5
|
[bug] fixed batch api for DeepSeek V3/R1 (#3754)
|
2025-02-21 10:28:16 -08:00 |
|
Jackmin801
|
5f0e7de339
|
[Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-02-10 15:54:37 -08:00 |
|
Lianmin Zheng
|
1dda8c5e4c
|
Return more infos for computing average acceptance length (#3152)
|
2025-01-26 04:51:54 -08:00 |
|
Seungduk Kim
|
d77caa2b75
|
[#2812] Make the decode status dict capcity adjustable by a CLI param (#2839)
|
2025-01-19 11:36:53 -08:00 |
|
giorgiopiatti-dfinity
|
8b6a4486ec
|
fix missing revision arg when loading tokenizer (#2982)
|
2025-01-19 11:36:07 -08:00 |
|
Lianmin Zheng
|
0427416b59
|
Fix zmq binding (#2930)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
|
2025-01-16 14:36:07 -08:00 |
|
Lianmin Zheng
|
f65c13b559
|
Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902)
|
2025-01-15 04:54:14 -08:00 |
|
Shi Shuai
|
c4f9707e16
|
Improve: Token-In Token-Out Usage for RLHF (#2843)
|
2025-01-11 15:14:26 -08:00 |
|
Shi Shuai
|
35bdb48557
|
[Feature] Get Token IDs with Engine.generate() (#2636)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-29 12:28:27 -08:00 |
|
Lianmin Zheng
|
0ce091a82d
|
[Minor] Improve code style (#2419)
|
2024-12-09 03:05:59 -08:00 |
|
Lianmin Zheng
|
a6ca736c8e
|
Simplify stream_output (#2398)
|
2024-12-08 12:27:13 -08:00 |
|
SangBin Cho
|
1f09e84b9a
|
nit: Remove busy waiting on scheduler (#2382)
|
2024-12-08 01:06:15 -08:00 |
|
Lianmin Zheng
|
d4fc1a70e3
|
Crash the server correctly during error (#2231)
|
2024-11-28 00:22:39 -08:00 |
|
Lianmin Zheng
|
c754652fcd
|
Fix flasky tests (#2212)
|
2024-11-26 23:06:20 -08:00 |
|
Ying Sheng
|
e1e595d702
|
[feat] Refactor session control interface and add CI (#2173)
|
2024-11-25 12:32:51 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Ying Sheng
|
5942dfc00a
|
[feat] Add session control (#2073)
|
2024-11-20 00:36:53 -08:00 |
|
Lianmin Zheng
|
2558d6a675
|
Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042)
|
2024-11-15 05:02:44 -08:00 |
|
Liangsheng Yin
|
1e8903414a
|
Fix possible ZMQ hanging (#1800)
|
2024-10-25 23:07:07 -07:00 |
|
Lianmin Zheng
|
fb99aaa527
|
[Fix] Fix --skip-tokenizer-init (#1798)
|
2024-10-25 18:51:59 -07:00 |
|
Ying Sheng
|
2fce449b1c
|
[API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-10-23 07:02:29 +00:00 |
|
Ying Sheng
|
2725f8da61
|
[Minor] Rename no_eos_trim to no_stop_trim (#1661)
|
2024-10-13 20:30:03 -07:00 |
|
Ying Sheng
|
4876117171
|
[Fix] fix eos trim inconsistency (#1650)
|
2024-10-13 01:07:09 -07:00 |
|
Lianmin Zheng
|
114bbc8651
|
Use ipc instead of tcp in zmq (#1566)
|
2024-10-04 00:45:52 -07:00 |
|
Lianmin Zheng
|
048685430d
|
Improve process creation (#1534)
|
2024-09-29 02:36:12 -07:00 |
|
Lianmin Zheng
|
e165a9fc1b
|
Make detokenizer_manager.py not asyncio (#1532)
|
2024-09-28 19:33:09 -07:00 |
|
Lianmin Zheng
|
902278008a
|
[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208)
|
2024-08-25 14:46:34 -07:00 |
|
Lianmin Zheng
|
5623826f73
|
[Minor] Improve logging and rename the health check endpoint name (#1180)
|
2024-08-21 19:24:36 -07:00 |
|
Shan Yu
|
cd10654e7e
|
[Feat] Support update weights without restart server (#1157)
|
2024-08-20 13:48:24 -07:00 |
|
Lianmin Zheng
|
cdc8d60752
|
Improve the code style: more comments and remove useless packages (#1139)
|
2024-08-17 14:37:52 -07:00 |
|
Lianmin Zheng
|
d84c5e70f7
|
Test the case when max_new_tokens is very large (#1038)
|
2024-08-11 16:41:03 -07:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
e040a2450b
|
Add e5-mistral embedding model - step 3/3 (#988)
|
2024-08-08 16:31:19 -07:00 |
|