Lianmin Zheng
|
02f7f3e488
|
Update the transformers version in CI (#1690)
|
2024-10-16 19:03:55 -07:00 |
|
Zeng Zhongchao
|
2782132be8
|
Add date to logging messages (#1623) (#1679)
|
2024-10-16 18:54:55 -07:00 |
|
Michael Feil
|
b0facb3316
|
add orjson for jsonresponse (#1688)
|
2024-10-16 18:14:30 -07:00 |
|
Lianmin Zheng
|
dbec2f1847
|
Launch a thread to overlap CPU and GPU (#1687)
|
2024-10-16 11:20:17 -07:00 |
|
Lianmin Zheng
|
9116b2896f
|
Add a new event loop (#1677)
|
2024-10-16 01:33:20 -07:00 |
|
Patrick Yi
|
31fad29ab0
|
Add get_tokenizer function for Engine class (#1653)
|
2024-10-12 19:39:35 -07:00 |
|
Byron Hsu
|
862cd265e5
|
[engine] support async and streaming (#1614)
|
2024-10-11 15:26:25 -07:00 |
|
Lianmin Zheng
|
23cc66f7b6
|
Add back data parallelism (#1635)
|
2024-10-11 07:22:48 -07:00 |
|
科英
|
bbd72bfc86
|
Add the ability to enable and disable the Profiler via HTTP API. (#1626)
|
2024-10-11 02:34:25 -07:00 |
|
Byron Hsu
|
e8613df071
|
[Engine] Fix generate hanging issue after the first call (#1606)
|
2024-10-08 04:26:56 +00:00 |
|
Byron Hsu
|
565b05f02f
|
Use atexit hook to implicitly shutdown Runtime (#1595)
|
2024-10-07 05:18:45 +00:00 |
|
Byron Hsu
|
551a3a9d38
|
Provide an offline engine API (#1567)
|
2024-10-06 20:27:03 -07:00 |
|
Lianmin Zheng
|
114bbc8651
|
Use ipc instead of tcp in zmq (#1566)
|
2024-10-04 00:45:52 -07:00 |
|
Lianmin Zheng
|
32eb6e96f2
|
Organize sampling batch info better (#1562)
|
2024-10-03 18:29:49 -07:00 |
|
Lianmin Zheng
|
63ba2f8d7b
|
Clean up batch data structures: Introducing ModelWorkerBatch (#1544)
|
2024-09-30 06:41:49 -07:00 |
|
Lianmin Zheng
|
048685430d
|
Improve process creation (#1534)
|
2024-09-29 02:36:12 -07:00 |
|
Lianmin Zheng
|
4e4459b91f
|
Multiple minor fixes (#1530)
|
2024-09-28 14:43:35 -07:00 |
|
Ying Sheng
|
9aa6553d2a
|
[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525)
|
2024-09-27 23:32:11 -07:00 |
|
HAI
|
3a6e04185b
|
[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420)
|
2024-09-17 07:43:52 +00:00 |
|
Lianmin Zheng
|
27b557aea7
|
Clean up model loader (#1440)
|
2024-09-16 18:16:27 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|
Lianmin Zheng
|
fec185ce0c
|
Refactor attention backend (#1381)
|
2024-09-11 11:44:26 -07:00 |
|
Lianmin Zheng
|
c03cece42f
|
Improve error reporting during server launch (#1390)
|
2024-09-11 04:50:04 -07:00 |
|
Lianmin Zheng
|
46094e0c1b
|
Deprecate --disable-flashinfer and introduce --attention-backend (#1380)
|
2024-09-10 17:11:16 -07:00 |
|
josephrocca
|
dff2860a69
|
Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy (#1373)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-09-11 02:35:03 +10:00 |
|
Lianmin Zheng
|
e4d68afcf0
|
[Minor] Many cleanup (#1357)
|
2024-09-09 04:14:11 -07:00 |
|
Kai-Hsun Chen
|
0836055324
|
[Chore] Rename model_overide_args to model_override_args (#1284)
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-09-01 03:14:56 -07:00 |
|
Lianmin Zheng
|
0a97d7962d
|
[Fix] Fix OOM in llava base class (#1249)
|
2024-08-28 08:45:49 -07:00 |
|
Lianmin Zheng
|
bf53bf5142
|
[Fix] Fix llava on multi images (#1247)
|
2024-08-28 06:33:05 -07:00 |
|
Yineng Zhang
|
198974cd1a
|
feat: support sm75 with FlashInfer v0.1.6 (#1233)
|
2024-08-28 18:39:12 +10:00 |
|
caiyueliang
|
2f1d92834f
|
[FEAT] Support batches cancel (#1222)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 23:28:26 +00:00 |
|
Lianmin Zheng
|
902278008a
|
[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208)
|
2024-08-25 14:46:34 -07:00 |
|
Chayenne
|
30b4f771b0
|
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-25 10:29:12 -07:00 |
|
Ying Sheng
|
1cb4da5c5f
|
[Fix] the issue of random order when input is a list (#1199)
|
2024-08-24 21:43:03 -07:00 |
|
Lianmin Zheng
|
5623826f73
|
[Minor] Improve logging and rename the health check endpoint name (#1180)
|
2024-08-21 19:24:36 -07:00 |
|
Lianmin Zheng
|
bea2bb9eea
|
Improve multi-node stability (#1171)
|
2024-08-20 22:35:05 -07:00 |
|
Shan Yu
|
cd10654e7e
|
[Feat] Support update weights without restart server (#1157)
|
2024-08-20 13:48:24 -07:00 |
|
Lucien
|
6242c399ab
|
Generate 1 token to verify the health of the inference service in /health (#1154)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-21 03:14:34 +10:00 |
|
yichuan~
|
b997a18d74
|
[Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-18 23:45:41 -07:00 |
|
Lianmin Zheng
|
cdc8d60752
|
Improve the code style: more comments and remove useless packages (#1139)
|
2024-08-17 14:37:52 -07:00 |
|
Liangsheng Yin
|
3694f8f996
|
Mixed style of chunked prefill (#1013)
|
2024-08-16 09:13:00 +00:00 |
|
Lianmin Zheng
|
0cb099e20a
|
set CUDA_DEVICE_MAX_CONNECTIONS=1 (#1113)
|
2024-08-16 03:47:39 +10:00 |
|
Lianmin Zheng
|
326df4bab2
|
Use a single workspace for flashinfer (#1077)
|
2024-08-14 19:25:37 -07:00 |
|
Ying Sheng
|
6767e2229f
|
Support jinja as chat template file (#1104)
|
2024-08-14 17:43:14 -07:00 |
|
rainred
|
616b59f384
|
[Feature] modify Runtime to support skip_tokenizer_init (#1088)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-14 00:28:04 -07:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Yineng Zhang
|
94752ac811
|
feat: use FlashInfer rmsnorm and silu (#907)
|
2024-08-11 14:57:13 +10:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
b16e856f11
|
Add openai embedding API (#997)
|
2024-08-09 11:19:18 -07:00 |
|
liuyhwangyh
|
b91a4cb1b1
|
support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
|
2024-08-09 02:52:14 -07:00 |
|