fzyzcjy
|
fd28640dc5
|
Add update_weights_from_tensor (#2631)
|
2024-12-28 13:30:27 -08:00 |
|
Lianmin Zheng
|
a6ca736c8e
|
Simplify stream_output (#2398)
|
2024-12-08 12:27:13 -08:00 |
|
Chayenne
|
983bfcf386
|
Online weight updates from torch.distributed (#2279)
|
2024-12-01 23:23:18 -08:00 |
|
Chayenne
|
7d1485d376
|
Add get weights by parameter name for llama (#2266)
|
2024-11-29 23:36:38 -08:00 |
|
Chayenne
|
7d5d1d3d29
|
udate weights from disk (#2265)
|
2024-11-30 01:17:00 +00:00 |
|
Lianmin Zheng
|
fe97a2d40f
|
Simplify tokenizer manager (#2254)
|
2024-11-29 02:18:51 -08:00 |
|
Rin Intachuen
|
1aea19f64b
|
Input_embeds support (#2052)
|
2024-11-25 16:35:04 -08:00 |
|
Ying Sheng
|
e1e595d702
|
[feat] Refactor session control interface and add CI (#2173)
|
2024-11-25 12:32:51 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Ying Sheng
|
5942dfc00a
|
[feat] Add session control (#2073)
|
2024-11-20 00:36:53 -08:00 |
|
Lianmin Zheng
|
befc6beb86
|
Fix a typo in io_struct.py (#2008)
|
2024-11-11 16:34:10 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Lianmin Zheng
|
2ce32db6fb
|
Let reward model take text inputs instead of message lists (#1907)
Co-authored-by: Kyle Corbitt <kyle@corbt.com>
|
2024-11-03 13:27:12 -08:00 |
|
Lianmin Zheng
|
c17c578108
|
Simplify tokenizer manager (#1904)
|
2024-11-03 08:38:26 -08:00 |
|
Lianmin Zheng
|
838dcda162
|
Simplify tokenizer manager (#1899)
|
2024-11-03 03:52:38 -08:00 |
|
Lianmin Zheng
|
fb99aaa527
|
[Fix] Fix --skip-tokenizer-init (#1798)
|
2024-10-25 18:51:59 -07:00 |
|
Ying Sheng
|
2fce449b1c
|
[API] add get memory pool size (#1760)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2024-10-23 07:02:29 +00:00 |
|
Lianmin Zheng
|
7feba41584
|
Fix failed ci tests on long prompts; Better error messages for embedding models (#1700)
|
2024-10-17 09:23:29 -07:00 |
|
Ying Sheng
|
2725f8da61
|
[Minor] Rename no_eos_trim to no_stop_trim (#1661)
|
2024-10-13 20:30:03 -07:00 |
|
Ying Sheng
|
4876117171
|
[Fix] fix eos trim inconsistency (#1650)
|
2024-10-13 01:07:09 -07:00 |
|
科英
|
bbd72bfc86
|
Add the ability to enable and disable the Profiler via HTTP API. (#1626)
|
2024-10-11 02:34:25 -07:00 |
|
Yiding-Lu
|
b503881bd2
|
[Bug] Fix the Image Input of Batch Generation (#1579)
|
2024-10-11 02:25:04 -07:00 |
|
Byron Hsu
|
521f862d90
|
Fix runtime.generate when sampling param is not passed (#1582)
|
2024-10-05 17:59:05 -07:00 |
|
Ying Sheng
|
f202ed9712
|
[Refactor] Simplify io_struct and tokenizer_manager (#1549)
|
2024-10-01 10:25:32 -07:00 |
|
Lianmin Zheng
|
f86c1e611f
|
Move scheduler code from tp_worker.py to scheduler.py (#1538)
|
2024-09-29 17:42:45 -07:00 |
|
Liangsheng Yin
|
fd9ad817ec
|
Organize image inputs (#1531)
|
2024-09-29 06:28:55 +00:00 |
|
Ying Sheng
|
9aa6553d2a
|
[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525)
|
2024-09-27 23:32:11 -07:00 |
|
Xiao Yu
|
5752f25eef
|
Fixed n>1 causing list index out of range with VLM (#1449)
|
2024-09-18 00:46:32 -07:00 |
|
Lianmin Zheng
|
9ba1f09760
|
[Fix] Fix logprob and normalized_logprob (#1428)
|
2024-09-15 06:36:06 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|
wangchao
|
fec2d1223c
|
[Fix] fix bug of undefined is_single in meth create_abort_task (#1370)
|
2024-09-10 01:17:37 -07:00 |
|
Kaichen Zhang - NTU
|
662ecd9368
|
[Feat] Add modalities for vision server when handling pixel values for llava (#1346)
|
2024-09-09 02:07:34 -07:00 |
|
Zhiqiang Xie
|
8153168c96
|
fix data racing due to mutable reference using deepcopy (#1255)
|
2024-08-28 18:57:54 -07:00 |
|
Lianmin Zheng
|
bf53bf5142
|
[Fix] Fix llava on multi images (#1247)
|
2024-08-28 06:33:05 -07:00 |
|
Liangsheng Yin
|
83e23c69b3
|
Improve code style of sampler (#1168)
|
2024-08-21 16:48:24 -07:00 |
|
Shan Yu
|
cd10654e7e
|
[Feat] Support update weights without restart server (#1157)
|
2024-08-20 13:48:24 -07:00 |
|
yichuan~
|
b997a18d74
|
[Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-18 23:45:41 -07:00 |
|
Lianmin Zheng
|
cdc8d60752
|
Improve the code style: more comments and remove useless packages (#1139)
|
2024-08-17 14:37:52 -07:00 |
|
Ying Sheng
|
b68c4c073b
|
fix: force max new tokens to be 1 for embedding request (#1019)
|
2024-08-10 13:46:42 -07:00 |
|
Ying Sheng
|
b16e856f11
|
Add openai embedding API (#997)
|
2024-08-09 11:19:18 -07:00 |
|
Ying Sheng
|
20a4f927dc
|
Add io struct for embedding models [unreachable code] - step 2/3 (#987)
|
2024-08-08 07:52:31 +00:00 |
|
yichuan~
|
d53dcf9c98
|
Support more OpenAI API test (#916)
|
2024-08-04 16:43:09 -07:00 |
|
Liangsheng Yin
|
cdcbde5fc3
|
Code structure refactor (#807)
|
2024-07-29 23:04:48 -07:00 |
|
yichuan~
|
084fa54d37
|
Add support for OpenAI API : offline batch(file) processing (#699)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-07-29 13:07:18 -07:00 |
|
Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Lianmin Zheng
|
30db99b3d9
|
Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776)
|
2024-07-27 19:50:34 -07:00 |
|
Yineng Zhang
|
6b32bb1c0b
|
misc: format (#741)
|
2024-07-26 21:00:51 +10:00 |
|
Toshiki Kataoka
|
da504445dc
|
fix /generate without sampling_params (#734)
|
2024-07-26 01:27:56 -07:00 |
|
Liangsheng Yin
|
f424e76d96
|
Fix illegal tokens during sampling (#676)
|
2024-07-20 03:11:15 -07:00 |
|
Lianmin Zheng
|
35759efa91
|
Support random dataset in bench_serving.py (#669)
|
2024-07-20 01:06:43 -07:00 |
|